nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

Medaka installation problems and a question about models #450

Closed mprous1 closed 10 months ago

mprous1 commented 11 months ago

First of all, correspondence between Guppy bascalling models and Medaka models has become totally unclear.

I used Guppy 6.5.7 to basecall 5 khz data, using the model dna_r10.4.1_e8.2_400bps_5khz_sup

In Medaka 1.8.0 there is no such model, but maybe r1041_e82_400bps_sup_v4.2.0 could be the corresponding one. Not sure, as googling or search in Nanopore community did not reveal much.

Medaka 1.8.1 is apparently not yet available on Conda, so I installed Medaka 1.8.0. Installation with pip I did not try as previously this turned out to be impossible to do successfully.

I'm using Linux subsystem Ubuntu on Windows (no GPU).

Running Medaka produced two types of errors in one computer, but only one of these in the other.

In one case the following error is produced many times while running medaka_consensus: ``` /home/mprous/miniconda3/envs/medaka18/lib/python3.10/site-packages/numpy/core/getlimits.py:542: UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\x00\x00\x00\x00\x00\x00' for <class 'numpy.longdouble'> does not match any known type: falling back to type probe function. This warnings indicates broken support for the dtype! machar = _get_machar(dtype) ```

But these errors seemed to be harmless (?) and did not appear in the other computer.

The second type of error is related to 'keras_preprocessing':

``` [18:53:29 - MdlStrTF] ModelStoreTF exception <class 'ModuleNotFoundError'> Traceback (most recent call last): File "/home/mprous/miniconda3/envs/medaka18/bin/medaka", line 11, in sys.exit(main()) File "/home/mprous/miniconda3/envs/medaka18/lib/python3.10/site-packages/medaka/medaka.py", line 724, in main args.func(args) File "/home/mprous/miniconda3/envs/medaka18/lib/python3.10/site-packages/medaka/prediction.py", line 160, in predict model = model_store.load_model(time_steps=args.chunk_len) File "/home/mprous/miniconda3/envs/medaka18/lib/python3.10/site-packages/medaka/datastore.py", line 180, in load_model self.model = model_partial_function(time_steps=time_steps) File "/home/mprous/miniconda3/envs/medaka18/lib/python3.10/site-packages/medaka/models.py", line 132, in build_model from tensorflow.keras.models import Sequential File "/home/mprous/miniconda3/envs/medaka18/lib/python3.10/site-packages/keras/api/_v2/keras/init.py", line 8, in from keras import version File "/home/mprous/miniconda3/envs/medaka18/lib/python3.10/site-packages/keras/init.py", line 25, in from keras import models File "/home/mprous/miniconda3/envs/medaka18/lib/python3.10/site-packages/keras/models.py", line 20, in from keras import metrics as metrics_module File "/home/mprous/miniconda3/envs/medaka18/lib/python3.10/site-packages/keras/metrics.py", line 24, in from keras import activations File "/home/mprous/miniconda3/envs/medaka18/lib/python3.10/site-packages/keras/activations.py", line 20, in from keras.layers import advanced_activations File "/home/mprous/miniconda3/envs/medaka18/lib/python3.10/site-packages/keras/layers/init.py", line 30, in from keras.layers.preprocessing.image_preprocessing import CenterCrop File "/home/mprous/miniconda3/envs/medaka18/lib/python3.10/site-packages/keras/layers/preprocessing/image_preprocessing.py", line 24, in from keras.preprocessing.image import smart_resize File "/home/mprous/miniconda3/envs/medaka18/lib/python3.10/site-packages/keras/preprocessing/init.py", line 19, in import keras_preprocessing ModuleNotFoundError: No module named 'keras_preprocessing' Failed to run medaka consensus. ```

Then in my medaka environment I did this: conda install -c conda-forge keras-preprocessing

and the errors related to 'keras_preprocessing' disappeared.

Although I finally got the new version of Medaka to work (if the numpy errors are not fatal), maybe these error reports are of some use. Also I'd really like to know the correspondence between the Guppy and Medaka models. There should be a table about this on Medaka github page.

cjw85 commented 10 months ago

The correct model to use in medaka is r1041_e82_400bps_hac_v4.2.0. The naming stems from the fact that in the dorado basecaller the corresponding basecaller model is called dna_r10.4.1_e8.2_400bps_hac@v4.2.0. Recent medaka models now mirror the dorado naming convention.

mprous1 commented 10 months ago

Thanks!