rpetit3 / dragonflye

:dragon: :fly: Assemble bacterial isolate genomes from Nanopore reads
GNU General Public License v3.0
117 stars 10 forks source link

mismatch between model names valid for dragonflye 1.1.0 and medaka 1.8.0 #19

Closed flass closed 1 year ago

flass commented 1 year ago

Hi,

I have made a conda install of dragonflye (within a docker image), forcing the dependencies for flye and medaka to be the latest versions:

micromamba install -n base -y -c conda-forge -c bioconda \
    flye=2.9.2 \
    medaka=1.8.0 \
    dragonflye=1.1.0

this works, but if I want to specifiy the use of the latest model r1041_e82_400bps_sup_v420 , I get an error at the medaka stage:

[...]
[dragonflye] Running: medaka_consensus -i READS.fq.gz -d flye/polish/racon/1/consensus.fasta -o flye/polish/medaka/1 -m r1041_e82_400bps_sup_v420 -t 4  2>&1 | sed 's/^/[polishing - medaka (1 of 1)] /' | tee -a dragonflye.log
[polishing - medaka (1 of 1)] Traceback (most recent call last):
[polishing - medaka (1 of 1)]   File "/opt/conda/lib/python3.10/site-packages/medaka/medaka.py", line 35, in __call__
[polishing - medaka (1 of 1)]     model_fp = medaka.models.resolve_model(val)
[polishing - medaka (1 of 1)]   File "/opt/conda/lib/python3.10/site-packages/medaka/models.py", line 31, in resolve_model
[polishing - medaka (1 of 1)]     raise ValueError(
[polishing - medaka (1 of 1)] ValueError: Model r1041_e82_400bps_sup_v420 is not a known model or existant file.
[dragonflye] Error running command: medaka_consensus -i READS.fq.gz -d flye/polish/racon/1/consensus.fasta -o flye/polish/medaka/1 -m r1041_e82_400bps_sup_v420 -t 4  2>&1 | sed 's/^/[polishing - medaka (1 of 1)] /' | tee -a
dragonflye.log

Indeed medaka wants something like this: r1041_e82_400bps_sup_v4.2.0, with dots in the version name.

docker run -v $HOME:$HOME -w $HOME/test gitlab-registry.internal.sanger.ac.uk/sanger-pathogens/docker-images-test/dragonflye:1.1.0 medaka tools list\_models
Available: r103_fast_g507, r103_fast_snp_g507, r103_fast_variant_g507, r103_hac_g507, r103_hac_snp_g507, r103_hac_variant_g507, r103_min_high_g345, r103_min_high_g360, r103_prom_high_g360, r103_prom_snp_g3210, r103_prom_variant_g3210, r103_sup_g507, r103_sup_snp_g507, r103_sup_variant_g507, r1041_e82_260bps_fast_g632, r1041_e82_260bps_fast_variant_g632, r1041_e82_260bps_hac_g632, r1041_e82_260bps_hac_v4.0.0, r1041_e82_260bps_hac_v4.1.0, r1041_e82_260bps_hac_variant_g632, r1041_e82_260bps_hac_variant_v4.1.0, r1041_e82_260bps_sup_g632, r1041_e82_260bps_sup_v4.0.0, r1041_e82_260bps_sup_v4.1.0, r1041_e82_260bps_sup_variant_g632, r1041_e82_260bps_sup_variant_v4.1.0, r1041_e82_400bps_fast_g615, r1041_e82_400bps_fast_g632, r1041_e82_400bps_fast_variant_g615, r1041_e82_400bps_fast_variant_g632, r1041_e82_400bps_hac_g615, r1041_e82_400bps_hac_g632, r1041_e82_400bps_hac_v4.0.0, r1041_e82_400bps_hac_v4.1.0, r1041_e82_400bps_hac_v4.2.0, r1041_e82_400bps_hac_variant_g615, r1041_e82_400bps_hac_variant_g632, r1041_e82_400bps_hac_variant_v4.1.0, r1041_e82_400bps_hac_variant_v4.2.0, r1041_e82_400bps_sup_g615, r1041_e82_400bps_sup_v4.0.0, r1041_e82_400bps_sup_v4.1.0, r1041_e82_400bps_sup_v4.2.0, r1041_e82_400bps_sup_variant_g615, r1041_e82_400bps_sup_variant_v4.1.0, r1041_e82_400bps_sup_variant_v4.2.0, r104_e81_fast_g5015, r104_e81_fast_variant_g5015, r104_e81_hac_g5015, r104_e81_hac_variant_g5015, r104_e81_sup_g5015, r104_e81_sup_g610, r104_e81_sup_variant_g610, r10_min_high_g303, r10_min_high_g340, r941_e81_fast_g514, r941_e81_fast_variant_g514, r941_e81_hac_g514, r941_e81_hac_variant_g514, r941_e81_sup_g514, r941_e81_sup_variant_g514, r941_min_fast_g303, r941_min_fast_g507, r941_min_fast_snp_g507, r941_min_fast_variant_g507, r941_min_hac_g507, r941_min_hac_snp_g507, r941_min_hac_variant_g507, r941_min_high_g303, r941_min_high_g330, r941_min_high_g340_rle, r941_min_high_g344, r941_min_high_g351, r941_min_high_g360, r941_min_sup_g507, r941_min_sup_snp_g507, r941_min_sup_variant_g507, r941_prom_fast_g303, r941_prom_fast_g507, r941_prom_fast_snp_g507, r941_prom_fast_variant_g507, r941_prom_hac_g507, r941_prom_hac_snp_g507, r941_prom_hac_variant_g507, r941_prom_high_g303, r941_prom_high_g330, r941_prom_high_g344, r941_prom_high_g360, r941_prom_high_g4011, r941_prom_snp_g303, r941_prom_snp_g322, r941_prom_snp_g360, r941_prom_sup_g507, r941_prom_sup_snp_g507, r941_prom_sup_variant_g507, r941_prom_variant_g303, r941_prom_variant_g322, r941_prom_variant_g360, r941_sup_plant_g610, r941_sup_plant_variant_g610
Default consensus:  r1041_e82_400bps_sup_v4.2.0
Default variant:  r1041_e82_400bps_sup_variant_v4.2.0

If trying to give that medaka-valid value to dragonflye:

dragonflye \
--reads dragonflye/barcode07.fastq.gz \
--R1 4075_2#2_1.fastq.gz \
--R2 4075_2#2_2.fastq.gz \
--gsize 4.5M --medaka 1 --model r1041_e82_400bps_sup_v4.2.0 \
--cpus 4 --ram 6 --outdir dragonflye/test_IP6794-89

then dragonflye fails at the argument validation step:

[dragonflye] You ran: /opt/conda/bin/dragonflye --reads dragonflye/barcode07.fastq.gz --R1 dragonflye/4075_2#2_1
.fastq.gz --R2 dragonflye/4075_2#2_2.fastq.gz --gsize 4.5M --medaka 1 --model r1041_e82_400bps_sup_v4.2.0 --cpus 4 --ram 6 --outdir dragonflye/test_IP6794-89
[dragonflye] This is dragonflye 1.1.0
[dragonflye] Written by Robert A Petit III
[dragonflye] Homepage is https://github.com/rpetit3/dragonflye
[dragonflye] Operating system is linux
[dragonflye] Perl version is v5.32.1
[dragonflye] Machine has 256 CPU cores and 2015.34 GB RAM
[dragonflye] Verifying input model (--model): r1041_e82_400bps_sup_v4.2.0
[dragonflye] Unable to verify model 'r1041_e82_400bps_sup_v4.2.0', please check spelling and try again.
[dragonflye] Available Medaka models include:
[dragonflye]    r103_fast_g507
[dragonflye]    r103_hac_g507
[dragonflye]    r103_min_high_g345
[dragonflye]    r103_min_high_g360
[...]

Could you please change your validation scheme so that it matches that of medaka?

Best wishes, Florent

incoherentian commented 1 year ago

Problem is clearly not enough cores or RAM.

...more seriously, I thought something else might happen when pinning flye 2.9.2 (I'm guessing that's what you meant), thus backed off for someone else to try first 😈 Thank you for being that person!

So the wrapper is actually still passing kit14 along fine to flye2.9.2, just not medaka models for kit 14? That's actually still pretty awesome, I think I'm going to pin these too. Non- kit 14 models with medaka 1.8.0 are still polishing fine?

flass commented 1 year ago

yes, it's flye 2.9.2, sorry (corrected in original post) the flye step runs fine. i'm still having issues troubleshooting the medaka run.
with a conda install, apparently medaka comes with only the default models (those for Kit14) in /opt/conda/medaka/medaka/data/, so I had to copy the extra model files I needed from the git repo. now i'm getting some error linked to opening the model tar.gz file...
I can update here when succeeding, but I would not want to distract from the main poit of this issue, which that the validation scheme of dragonflye is off.

rpetit3 commented 1 year ago

Hi @flass

I'll get this fixed today, let me know if you think its worth updating the medaka recipe to include those missing models.

Cheers, Robert

rpetit3 commented 1 year ago

@flass I think I have this fixed now, if you want to give it a try.

Here's the link to download: https://raw.githubusercontent.com/rpetit3/dragonflye/main/bin/dragonflye

flass commented 1 year ago

Thank you Robert. I tried executing your dragonflye 1.1.1 sscript from within my docker container for v1.1.0 and it seems that it works!

mib114737i:dragonflye fl4$ docker run -v $HOME:$HOME -w $HOME/test/dragonflye gitlab-registry.internal.sanger.ac.uk/sanger-pathogens/docker-images-test/dragonflye:1.1.0 ./dragonflye --reads barcode07.fastq.gz --R1 4075_2#2_1.fastq.gz --R2 4075_2#2_2.fastq.gz --gsize 4.5M --medaka 1 --model r1041_e82_400bps_sup_v4.2.0 --cpus 4 --ram 6 --outdir test_IP6794-89-4
[dragonflye] Hello mambauser
[dragonflye] You ran: /Users/fl4/test/dragonflye/dragonflye --reads barcode07.fastq.gz --R1 4075_2#2_1.fastq.gz --R2 4075_2#2_2.fastq.gz --gsize 4.5M --medaka 1 --model r1041_e82_400bps_sup_v4.2.0 --cpus 4 --ram 6 --outdir test_IP6794-89-4
[dragonflye] This is dragonflye 1.1.1
[dragonflye] Written by Robert A Petit III
[dragonflye] Homepage is https://github.com/rpetit3/dragonflye
[dragonflye] Operating system is linux
[dragonflye] Perl version is v5.32.1
[dragonflye] Machine has 4 CPU cores and 7.68 GB RAM
[dragonflye] Verifying input model (--model): r1041_e82_400bps_sup_v4.2.0
[dragonflye] Model r1041_e82_400bps_sup_v4.2.0 verified!
[...going on to run the rest of the pipeline...]

will you have this released as a bioconda recipe / biocontainer image sometimes soon?

Best wishes,

Florent

rpetit3 commented 1 year ago

Awesome, I'll get a version release submitted. It'll take a few hours to get synced on Bioconda, as I think I just missed the hourly bot auto-bump job

flass commented 1 year ago

Awesome! No worries, I can wait tomorrow ;-) thanks a lot again. Florent

flass commented 1 year ago

any chance that the bioconda recipe (and hence the resulting biocontainer) will pick up flye 2.9.2 and medaka 1.8.0 by default?

rpetit3 commented 1 year ago

I'll verify in the build, but assuming yes.

Probably going to pin medaka to >=1.8.0. Think its worth pinning flye as well? Maybe just flye>=2.9

flass commented 1 year ago

yes I'd say so it's worth pinning both - the flye 2.9.2 assembly has worked fine for me (can't really comment on quality though).

it would defintely be worth having a bunch (or all) of medaka models included the bioconda recipe.

I personally wished to have the following: r1041_e82_400bps_sup_g615, r1041_e82_400bps_sup_variant_g615, r104_e81_sup_g610, r104_e81_sup_variant_g610, r941_min_sup_g507, r941_min_sup_variant_g507, r941_min_sup_snp_g507

Thanks!

incoherentian commented 1 year ago

If dragonflye now supporting more efficient kit 14 alignment with newer flye anyway, my vote would def. be yes!

P.S. I tried adding your channel as I thought that would not suffer a delay. Now rereading and see you're just pushing it straight to bioconda! mamba create -n dragonflye_m180 -c bioconda -c conda-forge -c rpetit3 dragonflye=1.1.1 flye=2.9.2 medaka=1.8.0

rpetit3 commented 1 year ago

@flass Mind double checking those, I think we are getting them now. Here's what I'm getting with v1.1.1 (haha might not have to do anything!)

[dragonflye]    Available:
[dragonflye]    r103_fast_g507
[dragonflye]    r103_fast_snp_g507
[dragonflye]    r103_fast_variant_g507
[dragonflye]    r103_hac_g507
[dragonflye]    r103_hac_snp_g507
[dragonflye]    r103_hac_variant_g507
[dragonflye]    r103_min_high_g345
[dragonflye]    r103_min_high_g360
[dragonflye]    r103_prom_high_g360
[dragonflye]    r103_prom_snp_g3210
[dragonflye]    r103_prom_variant_g3210
[dragonflye]    r103_sup_g507
[dragonflye]    r103_sup_snp_g507
[dragonflye]    r103_sup_variant_g507
[dragonflye]    r1041_e82_260bps_fast_g632
[dragonflye]    r1041_e82_260bps_fast_variant_g632
[dragonflye]    r1041_e82_260bps_hac_g632
[dragonflye]    r1041_e82_260bps_hac_v4.0.0
[dragonflye]    r1041_e82_260bps_hac_v4.1.0
[dragonflye]    r1041_e82_260bps_hac_variant_g632
[dragonflye]    r1041_e82_260bps_hac_variant_v4.1.0
[dragonflye]    r1041_e82_260bps_sup_g632
[dragonflye]    r1041_e82_260bps_sup_v4.0.0
[dragonflye]    r1041_e82_260bps_sup_v4.1.0
[dragonflye]    r1041_e82_260bps_sup_variant_g632
[dragonflye]    r1041_e82_260bps_sup_variant_v4.1.0
[dragonflye]    r1041_e82_400bps_fast_g615
[dragonflye]    r1041_e82_400bps_fast_g632
[dragonflye]    r1041_e82_400bps_fast_variant_g615
[dragonflye]    r1041_e82_400bps_fast_variant_g632
[dragonflye]    r1041_e82_400bps_hac_g615
[dragonflye]    r1041_e82_400bps_hac_g632
[dragonflye]    r1041_e82_400bps_hac_v4.0.0
[dragonflye]    r1041_e82_400bps_hac_v4.1.0
[dragonflye]    r1041_e82_400bps_hac_v4.2.0
[dragonflye]    r1041_e82_400bps_hac_variant_g615
[dragonflye]    r1041_e82_400bps_hac_variant_g632
[dragonflye]    r1041_e82_400bps_hac_variant_v4.1.0
[dragonflye]    r1041_e82_400bps_hac_variant_v4.2.0
[dragonflye]    r1041_e82_400bps_sup_g615
[dragonflye]    r1041_e82_400bps_sup_v4.0.0
[dragonflye]    r1041_e82_400bps_sup_v4.1.0
[dragonflye]    r1041_e82_400bps_sup_v4.2.0
[dragonflye]    r1041_e82_400bps_sup_variant_g615
[dragonflye]    r1041_e82_400bps_sup_variant_v4.1.0
[dragonflye]    r1041_e82_400bps_sup_variant_v4.2.0
[dragonflye]    r104_e81_fast_g5015
[dragonflye]    r104_e81_fast_variant_g5015
[dragonflye]    r104_e81_hac_g5015
[dragonflye]    r104_e81_hac_variant_g5015
[dragonflye]    r104_e81_sup_g5015
[dragonflye]    r104_e81_sup_g610
[dragonflye]    r104_e81_sup_variant_g610
[dragonflye]    r10_min_high_g303
[dragonflye]    r10_min_high_g340
[dragonflye]    r941_e81_fast_g514
[dragonflye]    r941_e81_fast_variant_g514
[dragonflye]    r941_e81_hac_g514
[dragonflye]    r941_e81_hac_variant_g514
[dragonflye]    r941_e81_sup_g514
[dragonflye]    r941_e81_sup_variant_g514
[dragonflye]    r941_min_fast_g303
[dragonflye]    r941_min_fast_g507
[dragonflye]    r941_min_fast_snp_g507
[dragonflye]    r941_min_fast_variant_g507
[dragonflye]    r941_min_hac_g507
[dragonflye]    r941_min_hac_snp_g507
[dragonflye]    r941_min_hac_variant_g507
[dragonflye]    r941_min_high_g303
[dragonflye]    r941_min_high_g330
[dragonflye]    r941_min_high_g340_rle
[dragonflye]    r941_min_high_g344
[dragonflye]    r941_min_high_g351
[dragonflye]    r941_min_high_g360
[dragonflye]    r941_min_sup_g507
[dragonflye]    r941_min_sup_snp_g507
[dragonflye]    r941_min_sup_variant_g507
[dragonflye]    r941_prom_fast_g303
[dragonflye]    r941_prom_fast_g507
[dragonflye]    r941_prom_fast_snp_g507
[dragonflye]    r941_prom_fast_variant_g507
[dragonflye]    r941_prom_hac_g507
[dragonflye]    r941_prom_hac_snp_g507
[dragonflye]    r941_prom_hac_variant_g507
[dragonflye]    r941_prom_high_g303
[dragonflye]    r941_prom_high_g330
[dragonflye]    r941_prom_high_g344
[dragonflye]    r941_prom_high_g360
[dragonflye]    r941_prom_high_g4011
[dragonflye]    r941_prom_snp_g303
[dragonflye]    r941_prom_snp_g322
[dragonflye]    r941_prom_snp_g360
[dragonflye]    r941_prom_sup_g507
[dragonflye]    r941_prom_sup_snp_g507
[dragonflye]    r941_prom_sup_variant_g507
[dragonflye]    r941_prom_variant_g303
[dragonflye]    r941_prom_variant_g322
[dragonflye]    r941_prom_variant_g360
[dragonflye]    r941_sup_plant_g610
[dragonflye]    r941_sup_plant_variant_g610
flass commented 1 year ago

yep, all good, I have all I need there!

rpetit3 commented 1 year ago

v1.1.1 is now available: quay.io/biocontainers/dragonflye:1.1.1--hdfd78af_0

it should include medaka 1.8.0 and flye 2.9.2, let me know if not.

Otherwise I think we are good here! Thank you for the help, and please feel free to reopen