nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
439 stars 53 forks source link

Re-basecalling causing Q-scores to significantly drop #887

Closed lilif99 closed 1 week ago

lilif99 commented 2 weeks ago

Issue Report

Please describe the issue:

Hi, I have been re-basecalling some Nanopore cDNA reads generated a couple of years ago (originally basecalled with Guppy 6.1.5 high-accuracy). Doing this has resulted in the read qualities dropping dramatically (mean read quality = 8.1 with old base-calling and 3.8 with new), which is affecting my downstream analysis. Is there something I am doing wrong, or is this an artefact of the improved base-calling?

Steps to reproduce the issue:

Converting .fast5s to .pod5s: pod5 convert fast5 ./input/*.fast5 --output output_pod5s/ --one-to-one ./input/ Re-basecalling: dorado basecaller dna_r10.4.1_e8.2_400bps_sup@v4.1.0 ${input_path}/ > ${output_path} --emit-fastq --no-trim

The data was generated with adaptive sampling so it's important that the reads aren't trimmed.

Run environment:

malton-ont commented 2 weeks ago

Hi @lilif99,

It looks like you are rebasecalling with the wrong model. The FLO-MIN106 flowcell require a dna_r9.4.1 model (dna_r9.4.1_e8_sup@v3.6 would be the latest).

Since you've converted your data to pod5 it should be possible to simply call dorado basecaller sup ... to detect the appropriate model automatically.

lilif99 commented 1 week ago

Ah, thank you @malton-ont, its all working now!