nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
538 stars 65 forks source link

Why was 260 bps deprecated? #1109

Closed hjarnek closed 2 weeks ago

hjarnek commented 3 weeks ago

Hi,

I'm just curious why 260 bps translocation speed was deprecated when it provided higher accuracy? I think Flongle flow cells could be a cheap and valuable complement for many people still doing Sanger today, but those people don't care about sequencing depth, only accuracy.

StephDC commented 3 weeks ago

https://community.nanoporetech.com/posts/q20-kit-14-and-r10-4-1

Through continued optimisation we can now achieve Q20 when running HAC basecalling and over Q22 when running SUP basecalling at 400 bases per second - delivering both accuracy and output.

After upgrading to 5kHz sampling rate, 260bps and 400bps generate similar accuracy level. It appears that we do not need to sacrifice yield for insignificant accuracy gain any more.

And as Gabriel Wagner said in another post:

People are working on papers and suddenly run options disappear! We need this option for consistency.

I think the correct reason of still running on 260bps, 4kHz would be for consistency, not accuracy.

hjarnek commented 3 weeks ago

@StephDC : Alright, but Q22 is still too low for some applications. It would be nice to see the data comparing 260 bps to 400 bps. I can't access the link you shared.

Besides, the ONT website advertises Q26 accuracy for the latest SUP basecalling models, so what's the deal with Q22? Is that already outdated information, or does the real-world tests simply not yield the advertised accuracy?

StephDC commented 3 weeks ago

That link was the release note on the version of MinKNOW that removed 260bps. That quote was directly taken from that release note. I did not check with any other updates since the removal of 260 bps 4 kHz.

Before that update, we can run on 260 bps @ 4kHz or 400 bps @ 4kHz, and apparently 260 bps generated better results. With the shift to 5kHz sampling rate, ONT decided to only maintain one model at 400 bps, as that generates Q score on par with or better than 4 kHz 260 bps. We do not have an official 5 kHz 260 bps basecalling model from ONT and I think they are not developing such model.

Maybe if someone trained a basecall model on 260 bps 5 kHz we can do some comparison?

hjarnek commented 3 weeks ago

That would indeed be interesting I think. As I said, some people don't care at all about sequencing depth, and I think many Sanger people could be lured over with promises of top-notch accuracy, in combination with the long reads and relatively low price that ONT has to offer. But the option of running at 260 bps 5 kHz has to become available in MinKNOW then, if it's gonna be any idea developing a basecalling model for it.

StephDC commented 3 weeks ago

But the option of running at 260 bps 5 kHz has to become available in MinKNOW then, if it's gonna be any idea developing a basecalling model for it.

The order is probably going to be the opposite.

ONT has to develop a relatively good 260bps 5kHz model before it could add it to MinKNOW so others can run it. It also need to gather enough data to prove that it offers significantly better accuracy to compensate for the lost yield. Otherwise anyone set the run parameter to such value would never be able to basecall their sample with all current models ONT have and ended up with useless signal files at the end of an expensive (both flowcell and sample wise) sequencing run.

And the official basecalling models are likely not up to debate or suggestions by anyone other than ONT employee.

And forcibly setting the sequencer to run the sample at 260bps 5kHz in undocumented way is probably going to get a regular customer into huge legal troubles before they could develop a model.

And ONT seems not quite interested in developing such basecall model as they consider their customer needed to choose between yield and accuracy not desirable in the release note of the version dropping 260bps 4kHz. Quote from that post:

Whilst this allowed maximum flexibility, in practice it was difficult for users to make the choice between accuracy and output, and for us to fully support those choices with sufficient in-depth information.

Maybe instead of this kind of public platform, we as customers shall voice such suggestions in the community forum, or up to the sales so they could take that into consideration? Someone tried to follow up in the community forum but no answer there yet. I wonder if you could also join that community forum as customer to let them hear you and others who want the 260 bps back.

Kirk3gaard commented 2 weeks ago

@hjarnek Our latest test of basecalling can be seen here: https://github.com/Kirk3gaard/MicroBench/tree/main/analysis/zymohmw super accuracy basecalling at 400 bp/s was ~Q25.

hjarnek commented 2 weeks ago

And ONT seems not quite interested in developing such basecall model as they consider their customer needed to choose between yield and accuracy not desirable in the release note of the version dropping 260bps 4kHz. Quote from that post:

Whilst this allowed maximum flexibility, in practice it was difficult for users to make the choice between accuracy and output, and for us to fully support those choices with sufficient in-depth information.

Would be interesting to hear their full reasoning, I can't make sense of this. I know people who clearly don't care about sequencing depth, and for whom this wouldn't even be a question – if they could get higher accuracy, they would simply take it. I hope ONT is not underestimating the potential demand for this. As you say, comparability across or even just throughout studies is another reason to retain a slower sequencing speed option.

I'm not registered to the ONT forums, since our lab hasn't bought any ONT equipment yet. Why anyone would want to run a user community forum behind locked doors is by the way also beyond me...

@hjarnek Our latest test of basecalling can be seen here: https://github.com/Kirk3gaard/MicroBench/tree/main/analysis/zymohmw super accuracy basecalling at 400 bp/s was ~Q25.

That is very interesting. So you're getting above Q30, almost Q35, average accuracy with duplex basecalling – that is more than I've even seen ONT advertise themselves. Though I'm not sure I understand the graph correctly – how would it compare to graph showing base Phred score instead of length-adjusted read Phred score on the x-axis?

malton-ont commented 2 weeks ago

Kit support decisions are made at a product level - as this is not a dorado issue per se, I'm going to close this ticket. If you want more information I'd suggest raising this on the nanopore community forums.