nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

offline usage #437

Closed nick297 closed 1 year ago

nick297 commented 1 year ago

Our research cluster does not have access to the internet so cannot download models on the fly. When I've tried to pass a .tar.gz file directly to medaka, I get the following error:

medaka_consensus        -i reads.fastq.gz       -d contigs.fasta        -o output       -t 2    -m r941_e81_hac_g514_model.tar.gz

Traceback (most recent call last):
  File "/gpfs2/well/bag/users/nick/sureselect/work/conda/assembly-ccbf7c27ade2b7333e600eb1f54daeb2/bin/medaka", line 11, in <module>
    sys.exit(main())
  File "/gpfs2/well/bag/users/nick/sureselect/work/conda/assembly-ccbf7c27ade2b7333e600eb1f54daeb2/lib/python3.10/site-packages/medaka/medaka.py", line 724, in main
    args.func(args)
  File "/gpfs2/well/bag/users/nick/sureselect/work/conda/assembly-ccbf7c27ade2b7333e600eb1f54daeb2/lib/python3.10/site-packages/medaka/medaka.py", line 267, in is_rle_model
    print(is_rle_encoder(args.model))
  File "/gpfs2/well/bag/users/nick/sureselect/work/conda/assembly-ccbf7c27ade2b7333e600eb1f54daeb2/lib/python3.10/site-packages/medaka/medaka.py", line 274, in is_rle_encoder
    encoder = modelstore.get_meta('feature_encoder')
  File "/gpfs2/well/bag/users/nick/sureselect/work/conda/assembly-ccbf7c27ade2b7333e600eb1f54daeb2/lib/python3.10/site-packages/medaka/datastore.py", line 193, in get_meta
    self.unpack()
  File "/gpfs2/well/bag/users/nick/sureselect/work/conda/assembly-ccbf7c27ade2b7333e600eb1f54daeb2/lib/python3.10/site-packages/medaka/datastore.py", line 118, in unpack
    with tarfile.open(self.filepath) as tar:
  File "/gpfs2/well/bag/users/nick/sureselect/work/conda/assembly-ccbf7c27ade2b7333e600eb1f54daeb2/lib/python3.10/tarfile.py", line 1639, in open
    raise ReadError(f"file could not be opened successfully:\n{error_msgs_summary}")
tarfile.ReadError: file could not be opened successfully:
- method gz: ReadError('not a gzip file')
- method bz2: ReadError('not a bzip2 file')
- method xz: ReadError('not an lzma file')
- method tar: ReadError('invalid header')

I've downloaded the tar.gz file from (https://github.com/nanoporetech/medaka/tree/master/medaka/data) is this correct?

cjw85 commented 1 year ago

Hi @nick297,

Reviewing this code I think what you've done should work. Can you check the file you are providing is indeed an tar.gz file by trying to unpack it yourself with tar.

nick297 commented 1 year ago

Hi @cjw85,

The files I've pulled from the medaka data folder (https://github.com/nanoporetech/medaka/tree/master/medaka/data) both via wget and git cloning the repository don't seem to be tar files. I've tried untaring the with tar -xvzf:

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

And the mac unzip client gives an error too. Where should I get the models from? This is for r9.4.1 flowcells with guppy 5

Thanks, Nick

cjw85 commented 1 year ago

OK, this sounds like what I suspected. I believe the file you have is a git LFS stub not the actual model file. If you fetch the models using git you will first need to have git LFS set up.

nick297 commented 1 year ago

aaah ok, thanks, I did not know about git LFS!