nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
481 stars 59 forks source link

Could not determine sequencing Chemistry from read data? #1002

Closed hans-vg closed 2 weeks ago

hans-vg commented 2 weeks ago

I am working on converting an old pipeline that used guppy_basecaller, tombo, and meme to process a methylation experiment. I have the data in fast5 format. In the past, I used guppy as follows.

guppy_basecaller \
--flowcell FLO-FLG114 \
--kit SQK-LSK114 \
--input_path fast5_single/ \
--save_path guppy_DSM20213 \
--device "cuda:0" \
--trim_adapters -r

So far, I have the pipeline working to merge the multi FAST5 files into one POD5 file. Then, I use the merged POD5 file to call dorado as follows:

dorado basecaller --device cpu --emit-fastq dna_r10.4.1_e8.2_400bps_sup@v4.1.0 merged_pod5/converted.pod5 > merged_pod5/basecall.fastq

However, I get the chemistry error in the title. I have tried other combinations listed in the DNA Models section, but haven't been able to find one that works.

When running pod5 inspect on just one of the files, I get the following information

        context_tags: {'barcoding_enabled': '0', 'basecall_config_filename': 'dna_r10.3_450bps_fast.cfg', 'experiment_duration_set': '960', 'experiment_type': 'genomic_dna', 'local_basecalling': '1', 'package': 'bream4', 'package_version': '6.1.10', 'sample_frequency': '4000', 'sequencing_kit': 'sqk-lsk110'}
        experiment_name: 
        flow_cell_id: FAQ19104
        flow_cell_product_code: FLO-MIN111
        protocol_name: sequencing/sequencing_MIN111_DNA:FLO-MIN111:SQK-LSK110
        protocol_run_id: fdb2a817-abe2-42b5-ab7e-ff92eb6576ae
        protocol_start_time: 1970-01-01 00:00:00+00:00
        sample_id: no_sample
        sample_rate: 4000
        sequencing_kit: sqk-lsk110
        sequencer_position: MN36843
        sequencer_position_type: minion
        software: python-pod5-converter

Am I using the correct model? Which model would you suggest to use?

HalfPhoton commented 2 weeks ago

The error is saying that Dorado doesn't recognise the sequencing condition in the data. Specifically - the FLO-MIN111 flowcell product code (which I believe is R10.3) is not recognised or supported by Dorado.

Also the dna_r10.4.1_e8.2_400bps_sup@v4.1.0 model is for the R10.4.1 condition.

Best regards, Rich

hans-vg commented 2 weeks ago

Okay. To confirm, there is no way to process this data with Dorado, correct? Do I need to go back to guppy_basecaller to process this data?

HalfPhoton commented 2 weeks ago

Hi @hans-vg Yes - Dorado doesn't have models to support R10.3. Please use Guppy instead.

Best regards, Rich

hans-vg commented 2 weeks ago

Thank you for confirmation. We can close this ticket.