Closed flass closed 1 year ago
the fact that other model files can be loaded successfully suggests this particular file might be corrupted
I'll look into this a bit, I tried the downloading the model from the Github repo and it passed the tests.
I'm wondering if its on the Bioconda side. Have a few meetings this morning then I should be free to test
To further affirm likelihood of this issue being model-specific: Just ran an assembly in dflye 1.1.1 using ye-olde R9.4 data. At a glance r941_min_sup_g507 model polished just fine.
Goign to do some more testing, but so far on just conda build I'm not seeing this error. Testing with docker now
Looking like its a case of missing models on the biocontainer
docker run --rm -u $(id -u):$(id -g) -v ${PWD}:/data quay.io/biocontainers/dragonflye:1.1.1--hdfd78af_0 ls -lha /usr/local/lib/python3.10/site-packages/medaka/data
total 13M
drwxr-xr-x 2 root root 4.0K May 31 16:53 .
drwxr-xr-x 4 root root 4.0K May 31 16:53 ..
-rw-rw-r-- 1 root root 3.1M May 22 11:36 r1041_e82_400bps_hac_v4.2.0_model.tar.gz
-rwxrwxr-x 1 root root 3.2M May 22 11:36 r1041_e82_400bps_hac_variant_v4.2.0_model.tar.gz
-rwxrwxr-x 1 root root 3.2M May 22 11:36 r1041_e82_400bps_sup_v4.2.0_model.tar.gz
-rwxrwxr-x 1 root root 3.2M May 22 11:36 r1041_e82_400bps_sup_variant_v4.2.0_model.tar.gz
OK! Unfortunately, I don't think adding all the models to the biocontainer is an option due to the size (>600mb). But looking into the error message (I received a different one)
[polishing - medaka (1 of 1)] raise RuntimeError(msg.format(self.dest, str(e)))
[polishing - medaka (1 of 1)] RuntimeError: Error validating model from '--model' argument: The model file for r1041_e82_400bps_sup_g615 is not installed and could not be installed to any of /usr/local/lib/python3.10/site-packages/medaka/data or /.medaka/data. If you cannot gain write permissions, download the model file manually from https://github.com/nanoporetech/medaka/raw/master/medaka/data/r1041_e82_400bps_sup_g615_model.tar.gz and use the downloaded model as the --model option..
Medaka will try to write the model to two locations (https://github.com/nanoporetech/medaka/blob/master/medaka/options.py#L12-L16) the python package location and $HOME/.medaka/data
. On the biocontainer the HOME variable is not set, so it defaults to /.medaka/data
If we mount that directory it will work:
docker run \
--rm -u $(id -u):$(id -g) \
-v ${PWD}:/.medaka \
-v ${PWD}:/data quay.io/biocontainers/dragonflye:1.1.1--hdfd78af_0 \
dragonflye \
--reads /data/bactopia-tests/data/species/portiera/nanopore/ERR3772599.fastq.gz \
--cpus 0 \
--outdir /data/test-docker \
--gsize 300000 \
--assembler raven \
--racon 0 \
--model r1041_e82_400bps_sup_g615 \
--force
....
[polishing - medaka (1 of 1)] [19:21:29 - DataIndx] Loaded 1/1 (100.00%) sample files.
[polishing - medaka (1 of 1)] [19:21:29 - DataIndx] Loaded 1/1 (100.00%) sample files.
[polishing - medaka (1 of 1)] [19:21:29 - DataIndx] Loaded 1/1 (100.00%) sample files.
[polishing - medaka (1 of 1)] Polished assembly written to raven/polish/medaka/1/consensus.fasta, have a nice day.
...
ls -lha data/
total 3.3M
drwxr-xr-x 2 robert_petit robert_petit 4.0K Jun 2 19:20 .
drwxr-xr-x 6 robert_petit robert_petit 4.0K Jun 2 19:20 ..
-rw-r--r-- 1 robert_petit robert_petit 3.3M Jun 2 19:20 r1041_e82_400bps_sup_g615_model.tar.gz
This should get your going, let me know if not!
Hi Robert,
OK thanks for the tip. it is strange we get different bugs, but it might be due to our deployment of this image as a singularity image. I'll try your workaround and let you know how it goes.
Hi Robert, I can confirm that I could solved this issue by adding the models to the folders searched by medaka.
Specifically, I made a docker image based on quay.io/biocontainers/dragonflye:1.1.1--hdfd78af_0
and copying the desired models from https://github.com/nanoporetech/medaka/tree/v1.8.0/medaka/data
into the medaka python package folder /usr/local/lib/python3.10/site-packages/medaka/data/
.
Putting the model files in /.medaka/data/
did not work as medaka does not look for model files there, and this triggerred download of the model files from the repo - which works fine when executed as a docker container (and internet is accessible), but does not work for us as we turn this into a singularity image with no rights to write in.
thanks again for your help!!
Best wishes, Florent
Hi @flass
I think we are good to close this one, please feel free to reopen
Robert
Hi Robert,
I have been testing your beauitiful new version using the biocontainers docker image for v1.1.1.
Unfortunately I ran into an issue with medaka again, actually the same that I was experiencing myself with my custom docker image (as mentioned in issue #19)
My run with model
r1041_e82_400bps_sup_v4.2.0
went fine and completed successfully.Another run with model
r1041_e82_400bps_sup_g615
failed, see excerpt of dragonfly log below (full log attached):I know this sounds like a medaka issue, but do you have a clue how to fix this before I escalate? Unfortunately this model is the main model my users are looking to using...
dragonflye.log