nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
446 stars 54 forks source link

[error] key "input" not found in the top-level table #290

Closed adbeggs closed 12 months ago

adbeggs commented 12 months ago

Hi team,

Have come back to Dorado after a little while:

dorado basecaller dna_r10.4.1_e8.2_400bps_sup\@v4.1.0_5mCG_5hmCG\@v2/ benchmark.pod5 > calls.bam

And get:

[2023-07-10 08:00:34.804] [info] > Creating basecall pipeline
[2023-07-10 08:00:35.612] [error] [error] key "input" not found in the top-level table
 --> dna_r10.4.1_e8.2_400bps_sup@v4.1.0_5mCG_5hmCG@v2/config.toml
   |
 1 | [general]
   | ^--- the top-level table starts here

More bizarrely if I try and run it in modified basecalling mode:

dorado basecaller dna_r10.4.1_e8.2_400bps_sup\@v4.1.0_5mCG_5hmCG\@v2/ benchmark.pod5 --modified-bases
 5mCG > calls.bam

I get:

terminate called after throwing an instance of 'std::runtime_error'
  what():  could not find matching modification model for dna_r10.4.1_e8.2_400bps_sup@v4.1.0_5mCG_5hmCG@v2
Aborted (core dumped)
adbeggs commented 12 months ago

PPS It is running on our HPC with an A30 inside a slurm_interactive job, nothing particularly fancy

adbeggs commented 12 months ago

Okay ignore me - stupidity on my part in terms of model downloads etc.

lucky5sugar commented 12 months ago

I got exacly the same issue that @adbeggs stated. How did you overcome the problems? I am using a model downloaded by the code below. wget https://cdn.oxfordnanoportal.com/software/analysis/dorado/dna_r10.4.1_e8.2_400bps_hac@v4.1.0_5mCG_5hmCG@v2.zip

tijyojwad commented 12 months ago

Hi @lucky5sugar - when running base calling with mod bases you just need to specify the main model (in this case dna_r10.4.1_e8.2_400bps_hac@v4.1.0) and --modified-bases 5mCG_5hmCG in the cmdline. Dorado will automatically pick the right mod base model, you don't need to specify it.

adbeggs commented 12 months ago

Indeed, the problem I had was specifying the modified models rathe than the base models with the --modified-bases flag. Once I did that it worked perfectly. Have benchmarked it on a A100 single node and is approx 4 times faster than Guppy on the same file set.

lucky5sugar commented 11 months ago

@tijyojwad @adbeggs Thanks to your replies, the problem solved. I downloaded the corresponding simplex model and only specified simplex model.

wget https://cdn.oxfordnanoportal.com/software/analysis/dorado/dna_r10.4.1_e8.2_400bps_hac@v4.1.0_5mCG_5hmCG@v2.zip
unzip dna_r10.4.1_e8.2_400bps_hac@v4.1.0_5mCG_5hmCG@v2.zip
wget https://cdn.oxfordnanoportal.com/software/analysis/dorado/dna_r10.4.1_e8.2_400bps_hac@v4.1.0.zip
unzip dna_r10.4.1_e8.2_400bps_hac@v4.1.0.zip

dorado basecaller dna_r10.4.1_e8.2_400bps_hac@v4.1.0 benchmark.pod5 --modified-bases
 5mCG_5hmCG > calls.bam