Quality score differences when running dorado on the Gridion vs on a linux server

nanoporetech / dorado

Oxford Nanopore's Basecaller

https://nanoporetech.com/

Other

535 stars 64 forks source link

Quality score differences when running dorado on the Gridion vs on a linux server #1110

Open abridgeland opened 3 weeks ago

abridgeland commented 3 weeks ago

Hello, I observed that the basecall quality scores are lower when running dorado on Gridion vs off the gridion. Do you know why that might be? For both runs, I used the super high accuracy basecalling option. I specifically noticed that there was a 40 percent increase in the number of reads greater than Q15 when running dorado off the Gridion. Additionally, the mean read quality score was 13.3 when running dorado on the gridion vs 16.6 when running the basecaller off the gridion

HalfPhoton commented 3 weeks ago

Hi @abridgeland, I suspect you're using a different basecalling model in both cases. The most recent (and best) models are available in stand-alone dorado before they're available on device. So you might have used a newer version of the same model architecture and saw the improvement.

You can check which model was used by inspecting the SAM header.

Kind regards, Rich

abridgeland commented 3 weeks ago

Thanks for your suggestion. I did check the headers and found that you were correct and there was a newer model run using the stand-alone dorado. However, when I reran using the same model, I still noticed some differences in the results and higher quality data in the stand alone version. Please see my results below. We used the following model: dna_r10.4.1_e8.2_400bps_sup@v4.2.0 Dorado_comparison.xlsx

HalfPhoton commented 3 weeks ago

Hi @abridgeland, glad to hear they're much closer now - but that's still a greater difference than I'd expect too.

Are these on the same version of dorado and using the all of the same settings?

Kind regards, Rich