nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
409 stars 74 forks source link

Get models for Medaka with no access to internet? #441

Closed blediro closed 1 year ago

blediro commented 1 year ago

Hello,

I've been trying to run Medaka consensus from a node in an HPC, however, as the title suggests this node has no access to internet to download the models. Instead I downloaded the model I need ("r104_e81_sup_g610") and placed it in the data folder along with the default models: r104_e81_sup_g610_model.tar.gz in the folder medaka/data. I'm getting the following error after I execute medaka_consensus using a job scheduler:

Traceback (most recent call last): File "/arc/home/jcsm2010/mambaforge/envs/medaka/bin/medaka", line 11, in <module> sys.exit(main()) File "/arc/home/jcsm2010/mambaforge/envs/medaka/lib/python3.10/site-packages/medaka/medaka.py", line 724, in main args.func(args) File "/arc/home/jcsm2010/mambaforge/envs/medaka/lib/python3.10/site-packages/medaka/medaka.py", line 267, in is_rle_model print(is_rle_encoder(args.model)) File "/arc/home/jcsm2010/mambaforge/envs/medaka/lib/python3.10/site-packages/medaka/medaka.py", line 274, in is_rle_encoder encoder = modelstore.get_meta('feature_encoder') File "/arc/home/jcsm2010/mambaforge/envs/medaka/lib/python3.10/site-packages/medaka/datastore.py", line 193, in get_meta self.unpack() File "/arc/home/jcsm2010/mambaforge/envs/medaka/lib/python3.10/site-packages/medaka/datastore.py", line 118, in unpack with tarfile.open(self.filepath) as tar: File "/arc/home/jcsm2010/mambaforge/envs/medaka/lib/python3.10/tarfile.py", line 1639, in open raise ReadError(f"file could not be opened successfully:\n{error_msgs_summary}") tarfile.ReadError: file could not be opened successfully: - method gz: ReadError('not a gzip file') - method bz2: ReadError('not a bzip2 file') - method xz: ReadError('not an lzma file') - method tar: ReadError('invalid header')

Should I specify the full path of the model when I use the -m flag? Or is there anything else I should be doing? I'm using Medaka v 1.8.0 installed via conda(mamba).

cjw85 commented 1 year ago

The medaka CLI contains tools to download models. You can use these to download the models on a node that does have internet access. If you are using a file system shared between you login nodes and HPC nodes then medaka will find the downloaded models automatically.