rpetit3 / dragonflye

:dragon: :fly: Assemble bacterial isolate genomes from Nanopore reads
GNU General Public License v3.0
117 stars 10 forks source link

medaka fails to open model file for r1041_e82_400bps_sup_g615 #20

Closed flass closed 1 year ago

flass commented 1 year ago

Hi Robert,

I have been testing your beauitiful new version using the biocontainers docker image for v1.1.1.

Unfortunately I ran into an issue with medaka again, actually the same that I was experiencing myself with my custom docker image (as mentioned in issue #19)

My run with model r1041_e82_400bps_sup_v4.2.0 went fine and completed successfully.

Another run with model r1041_e82_400bps_sup_g615 failed, see excerpt of dragonfly log below (full log attached):

[dragonflye] Polishing with Medaka (1 rounds)
[dragonflye] Running: medaka_consensus -i READS.fq.gz -d flye/polish/racon/1/consensus.fasta -o flye/polish/medaka/1 -m r1041_e82_400bps_sup_g615 -t 4  2>&1 | sed 's/^/[polishing - medaka (1 of 1)] /' | tee -a dragonflye.log
[polishing - medaka (1 of 1)] Checking program versions
[polishing - medaka (1 of 1)] This is medaka 1.8.0
[polishing - medaka (1 of 1)] Program    Version    Required   Pass
[polishing - medaka (1 of 1)] bcftools   1.17       1.11       True
[polishing - medaka (1 of 1)] bgzip      1.17       1.11       True
[polishing - medaka (1 of 1)] minimap2   2.26       2.11       True
[polishing - medaka (1 of 1)] samtools   1.17       1.11       True
[polishing - medaka (1 of 1)] tabix      1.17       1.11       True
[polishing - medaka (1 of 1)] Traceback (most recent call last):
[polishing - medaka (1 of 1)]   File "/usr/local/bin/medaka", line 11, in <module>
[polishing - medaka (1 of 1)]     sys.exit(main())
[polishing - medaka (1 of 1)]   File "/usr/local/lib/python3.10/site-packages/medaka/medaka.py", line 724, in main
[polishing - medaka (1 of 1)]     args.func(args)
[polishing - medaka (1 of 1)]   File "/usr/local/lib/python3.10/site-packages/medaka/medaka.py", line 267, in is_rle_model
[polishing - medaka (1 of 1)]     print(is_rle_encoder(args.model))
[polishing - medaka (1 of 1)]   File "/usr/local/lib/python3.10/site-packages/medaka/medaka.py", line 274, in is_rle_encoder
[polishing - medaka (1 of 1)]     encoder = modelstore.get_meta('feature_encoder')
[polishing - medaka (1 of 1)]   File "/usr/local/lib/python3.10/site-packages/medaka/datastore.py", line 193, in get_meta
[polishing - medaka (1 of 1)]     self.unpack()
[polishing - medaka (1 of 1)]   File "/usr/local/lib/python3.10/site-packages/medaka/datastore.py", line 118, in unpack
[polishing - medaka (1 of 1)]     with tarfile.open(self.filepath) as tar:
[polishing - medaka (1 of 1)]   File "/usr/local/lib/python3.10/tarfile.py", line 1639, in open
[polishing - medaka (1 of 1)]     raise ReadError(f"file could not be opened successfully:\n{error_msgs_summary}")
[polishing - medaka (1 of 1)] tarfile.ReadError: file could not be opened successfully:
[polishing - medaka (1 of 1)] - method gz: ReadError('empty file')
[polishing - medaka (1 of 1)] - method bz2: ReadError('not a bzip2 file')
[polishing - medaka (1 of 1)] - method xz: ReadError('not an lzma file')
[polishing - medaka (1 of 1)] - method tar: ReadError('empty file')

[dragonflye] Error running command: medaka_consensus -i READS.fq.gz -d flye/polish/racon/1/consensus.fasta -o flye/polish/medaka/1 -m r1041_e82_400bps_sup_g615 -t 4  2>&1 | sed 's/^/[polishing - medaka (1 of 1)] /' | tee -a dragonflye.log

I know this sounds like a medaka issue, but do you have a clue how to fix this before I escalate? Unfortunately this model is the main model my users are looking to using...

dragonflye.log

flass commented 1 year ago

the fact that other model files can be loaded successfully suggests this particular file might be corrupted

rpetit3 commented 1 year ago

I'll look into this a bit, I tried the downloading the model from the Github repo and it passed the tests.

I'm wondering if its on the Bioconda side. Have a few meetings this morning then I should be free to test

incoherentian commented 1 year ago

To further affirm likelihood of this issue being model-specific: Just ran an assembly in dflye 1.1.1 using ye-olde R9.4 data. At a glance r941_min_sup_g507 model polished just fine.

rpetit3 commented 1 year ago

Goign to do some more testing, but so far on just conda build I'm not seeing this error. Testing with docker now

rpetit3 commented 1 year ago

Looking like its a case of missing models on the biocontainer

docker run --rm -u $(id -u):$(id -g) -v ${PWD}:/data quay.io/biocontainers/dragonflye:1.1.1--hdfd78af_0 ls -lha /usr/local/lib/python3.10/site-packages/medaka/data
total 13M
drwxr-xr-x    2 root     root        4.0K May 31 16:53 .
drwxr-xr-x    4 root     root        4.0K May 31 16:53 ..
-rw-rw-r--    1 root     root        3.1M May 22 11:36 r1041_e82_400bps_hac_v4.2.0_model.tar.gz
-rwxrwxr-x    1 root     root        3.2M May 22 11:36 r1041_e82_400bps_hac_variant_v4.2.0_model.tar.gz
-rwxrwxr-x    1 root     root        3.2M May 22 11:36 r1041_e82_400bps_sup_v4.2.0_model.tar.gz
-rwxrwxr-x    1 root     root        3.2M May 22 11:36 r1041_e82_400bps_sup_variant_v4.2.0_model.tar.gz
rpetit3 commented 1 year ago

OK! Unfortunately, I don't think adding all the models to the biocontainer is an option due to the size (>600mb). But looking into the error message (I received a different one)

[polishing - medaka (1 of 1)]     raise RuntimeError(msg.format(self.dest, str(e)))
[polishing - medaka (1 of 1)] RuntimeError: Error validating model from '--model' argument: The model file for r1041_e82_400bps_sup_g615 is not installed and could not be installed to any of /usr/local/lib/python3.10/site-packages/medaka/data or /.medaka/data. If you cannot gain write permissions, download the model file manually from https://github.com/nanoporetech/medaka/raw/master/medaka/data/r1041_e82_400bps_sup_g615_model.tar.gz and use the downloaded model as the --model option..

Medaka will try to write the model to two locations (https://github.com/nanoporetech/medaka/blob/master/medaka/options.py#L12-L16) the python package location and $HOME/.medaka/data. On the biocontainer the HOME variable is not set, so it defaults to /.medaka/data

If we mount that directory it will work:

docker run \
    --rm -u $(id -u):$(id -g) \
    -v ${PWD}:/.medaka \
    -v ${PWD}:/data quay.io/biocontainers/dragonflye:1.1.1--hdfd78af_0 \
        dragonflye \
            --reads /data/bactopia-tests/data/species/portiera/nanopore/ERR3772599.fastq.gz \
            --cpus 0 \
            --outdir /data/test-docker \
            --gsize 300000 \
            --assembler raven \
            --racon 0 \
            --model r1041_e82_400bps_sup_g615 \
            --force

....
[polishing - medaka (1 of 1)] [19:21:29 - DataIndx] Loaded 1/1 (100.00%) sample files.
[polishing - medaka (1 of 1)] [19:21:29 - DataIndx] Loaded 1/1 (100.00%) sample files.
[polishing - medaka (1 of 1)] [19:21:29 - DataIndx] Loaded 1/1 (100.00%) sample files.
[polishing - medaka (1 of 1)] Polished assembly written to raven/polish/medaka/1/consensus.fasta, have a nice day.
... 

ls -lha data/
total 3.3M
drwxr-xr-x 2 robert_petit robert_petit 4.0K Jun  2 19:20 .
drwxr-xr-x 6 robert_petit robert_petit 4.0K Jun  2 19:20 ..
-rw-r--r-- 1 robert_petit robert_petit 3.3M Jun  2 19:20 r1041_e82_400bps_sup_g615_model.tar.gz

This should get your going, let me know if not!

flass commented 1 year ago

Hi Robert,

OK thanks for the tip. it is strange we get different bugs, but it might be due to our deployment of this image as a singularity image. I'll try your workaround and let you know how it goes.

flass commented 1 year ago

Hi Robert, I can confirm that I could solved this issue by adding the models to the folders searched by medaka.

Specifically, I made a docker image based on quay.io/biocontainers/dragonflye:1.1.1--hdfd78af_0 and copying the desired models from https://github.com/nanoporetech/medaka/tree/v1.8.0/medaka/data into the medaka python package folder /usr/local/lib/python3.10/site-packages/medaka/data/.

Putting the model files in /.medaka/data/ did not work as medaka does not look for model files there, and this triggerred download of the model files from the repo - which works fine when executed as a docker container (and internet is accessible), but does not work for us as we turn this into a singularity image with no rights to write in.

thanks again for your help!!

Best wishes, Florent

rpetit3 commented 1 year ago

Hi @flass

I think we are good to close this one, please feel free to reopen

Robert