nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
439 stars 53 forks source link

Dorado "Failed to determine sequencing chemistry from data. Please select a model by path" #885

Closed legoscientist closed 2 weeks ago

legoscientist commented 2 weeks ago

Issue Report

Please describe the issue:

Up until now I have been using Dorado 0.6.2 installed locally on my area of our HPC, for basecalling. Yesterday that suddenly stopped working without any change in my script. I have copied my command below. I tried re-extracting dorado from the tar.gz file, and also tried the same with 0.7.1 (latest release). The error I am getting is "[error] Failed to determine sequencing chemistry from data. Please select a model by path"

Steps to reproduce the issue:

Please list any steps to reproduce the issue.

Run environment:

StephDC commented 2 weeks ago

[Additional Info Needed]

Assuming that you are using pod5 files. Do you know which kit and flowcell is used to generate the data?

There are a few unsupported combinations of kit and flowcell, such as R9 R10 mix-and-match like FLO-FLG001 + SQK-NBD114-24, that would result in such error. These combinations do not make sense and shall never be used together.

Follow the issue report guide and run the dorado basecaller with -v, and paste the output as well. That would be a great help to the troubleshooting process.

legoscientist commented 2 weeks ago

Forgot to update here! I eventually got it running, simply by re-copying and pasting the file path! I wonder if I had got a rogue space in there somewhere. Very odd, but as I say, did get it running with a downloaded, locally stored, model :)

[Additional Info Needed]

Assuming that you are using pod5 files. Do you know which kit and flowcell is used to generate the data?

There are a few unsupported combinations of kit and flowcell, such as R9 R10 mix-and-match like FLO-FLG001 + SQK-NBD114-24, that would result in such error. These combinations do not make sense and shall never be used together.

Follow the issue report guide and run the dorado basecaller with -v, and paste the output as well. That would be a great help to the troubleshooting process.

yesimon commented 4 days ago

I am having this error with latest dorado 0.7.2 on a supported combination: SQK-NBD114-24 with MIN114. I need to download the model separately and specify the full model path for it to work properly.

malton-ont commented 4 days ago

@yesimon,

That should indeed be a supported combination. Are you basecalling pod5 files? If so, can you run:

pod5 inspect debug <data.pod5> | grep -E "flow_cell_product_code|sequencing_kit"

and post the output.

If you are basecalling from fast5 files, automatic model detection is not available and you should convert your data.

yesimon commented 3 days ago
        flow_cell_product_code: FLO-MIN114
        sequencing_kit: sqk-nbd114-24

Here's the actual command and error. It appears that sup@latest works now.

$ dorado basecaller --kit-name SQK-NBD114-24 dna_r10.4.1_e8.2_400bps_sup@v5.0.0 pod5 > test.bam

[2024-06-27 17:56:52.920] [info] Running: "basecaller" "--kit-name" "SQK-NBD114-24" "dna_r10.4.1_e8.2_400bps_sup@v5.0.0" "pod5"
terminate called after throwing an instance of 'std::runtime_error'
  what():  toml::parse: file open error -> dna_r10.4.1_e8.2_400bps_sup@v5.0.0/config.toml
Aborted (core dumped)

$ dorado basecaller --kit-name SQK-NBD114-24 sup@latest pod5 > test.bam
[2024-06-27 17:58:18.168] [info] Running: "basecaller" "--kit-name" "SQK-NBD114-24" "sup@latest" "pod5"
[2024-06-27 17:58:18.203] [info]  - downloading dna_r10.4.1_e8.2_400bps_sup@v5.0.0 with httplib
[2024-06-27 17:58:20.838] [info] > Creating basecall pipeline
malton-ont commented 3 days ago

@yesimon,

dorado interprets dna_r10.4.1_e8.2_400bps_sup@v5.0.0 as a path, and therefore expects it to be present in the current working folder. As you've found, if you want to use automatic model detection then you don't need to specify the chemistry. You can find more info on automatic model selection here.