nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
452 stars 53 forks source link

Query on MinKNOW's Qscore Filtering and Dorado's Standalone Version #699

Closed VahidJavaran closed 2 months ago

VahidJavaran commented 4 months ago

Hello, I've been working with both the integrated Dorado basecaller in MinKNOW and its standalone version one dataset. I have a couple of questions I was hoping you could help me with:

  1. Could you provide details on the Qscore filtering applied within MinKNOW during the basecalling process?

2.I've noticed that in MinKNOW, when I set the "Minimum read splitting score" to 70 and the "Override minimum barcoding score" to 75, and then activate "barcode both ends," the rate of unclassified reads is lower compared to using the --barcode both ends option in Dorado's standalone version. Could you shed some light on how the MinKNOW settings translate to the standalone Dorado command line, specifically regarding the handling of barcodes and scoring to achieve similar results?

Thanks for your assistance,

tijyojwad commented 4 months ago

Could you provide details on the Qscore filtering applied within MinKNOW during the basecalling process?

The qscore thresholds are model dependent. I will get back to you with the exact numbers.

Could you shed some light on how the MinKNOW settings translate to the standalone Dorado command line, specifically regarding the handling of barcodes and scoring to achieve similar results?

The current MinKNOW barcoding algorithm is different from what dorado uses (although this will be unified quite soon). So the parameters are different. In dorado the scoring parameters can be adjusted using this config file. If the thresholds are lowered, the unclassified rate will go down (for both single ended and double ended checks)

gideonav commented 4 months ago

Could you provide details on the Qscore filtering applied within MinKNOW during the basecalling process?

The qscore thresholds are model dependent. I will get back to you with the exact numbers.

Could you shed some light on how the MinKNOW settings translate to the standalone Dorado command line, specifically regarding the handling of barcodes and scoring to achieve similar results?

The current MinKNOW barcoding algorithm is different from what dorado uses (although this will be unified quite soon). So the parameters are different. In dorado the scoring parameters can be adjusted using this config file. If the thresholds are lowered, the unclassified rate will go down (for both single ended and double ended checks)

so, to confirm, the dorado basecaller output files are filtered? in the answer to this question it said they were not, which didn't make sense to me since the model descriptions say that have a minimum cutoff score.

tijyojwad commented 4 months ago

dorado standalone (i.e. what you download from this GitHub repo) does not filter anything based on Q score by default.

However when run through MinKNOW a Q score filter is applied to the reads depending on the model used.

gideonav commented 4 months ago

but don't you choose a model when running with dorado: hac, sup, or fast? sorry i am just very confused by what the difference is

VahidJavaran commented 4 months ago

@gideonav Hi we select which model should be used for basecalling in MinKNOW. But there is no Qscore option to filter low quality reads in MinKNOW. In standalone version, we have this option to filter reads by "--min-qscore". The Qscore in not related directly to bascalling models. You have to set Qscore separately.

0x55555555 commented 4 months ago

Hi @VahidJavaran,

But there is no Qscore option to filter low quality reads in MinKNOW.

Just to jump in here, the MinKNOW filtering options are displayed during run output setup here: image

And can be configured by using the cog button on the right:

image

As @tijyojwad said, the qscore filter chosen is model dependent: 8 for FAST, 9 for HAC, 10 for SUP.

Hope that helps,

tijyojwad commented 4 months ago

@gideonav

but don't you choose a model when running with dorado: hac, sup, or fast? sorry i am just very confused by what the difference is

in standalone dorado, no default Q score filtering is applied regardless of which model is selected.

VahidJavaran commented 4 months ago

Hi @VahidJavaran,

But there is no Qscore option to filter low quality reads in MinKNOW.

Just to jump in here, the MinKNOW filtering options are displayed during run output setup here: image

And can be configured by using the cog button on the right: image

As @tijyojwad said, the qscore filter chosen is model dependent: 8 for FAST, 9 for HAC, 10 for SUP.

Hope that helps,

  • George

I think these options can be selected for sequencing with activated basecalling. can these options be applied just for basecalling analysis?

selmapichot commented 2 months ago

Hi, Regarding the Qscore in the standalone version, what value of Qscore reflects a high confidence base calling ? Many thanks

eesiribloom commented 2 months ago

Hi, Regarding the Qscore in the standalone version, what value of Qscore reflects a high confidence base calling ? Many thanks

@0x55555555 said "8 for FAST, 9 for HAC, 10 for SUP"

HAC and SUP are high and supper accuracy models respectively

tijyojwad commented 2 months ago

Sorry for the late reply @selmapichot - @eesiribloom answered it correctly. I would also add that only HAC and SUP should be used if accuracy is important.

@VahidJavaran

can these options be applied just for basecalling analysis?

Yes I believe these can also be configured for post-run basecalling.

selmapichot commented 2 months ago

Hi, Many thanks for your reply. Is it possible to do the filtering while basecalling+ aligning with dorado ? if not, could you please advise what would be the best method to filter out low qual reads ? before/after alignment ?

tijyojwad commented 2 months ago

Yes it's possible. If you add --min-qscore X to the dorado basecaller cmdline it will filter out any reads with mean qscore < X and align all remaining reads.

dorado basecaller model pod5 --min-qscore X --reference ref.fasta