nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
495 stars 59 forks source link

How does chimeric read splitting work and how to take advantage of additional adapters #677

Open pclavell opened 7 months ago

pclavell commented 7 months ago

Apparently, dorado v0.4 and newer versions have added a chimeric read splitting feature. However, it does not reach 100% sensitivity. I was wondering how is exactly the read splitting working. Assuming that it looks for ONT adapters in the middle of reads, chimera detection sensitivity could potentatially be improved if a specific library has additional adapters (like in my experimental setup where my library looks like ONT adapter - 5' linker - cDNA - 3' linker - ONT adapter. I could not find information about this in the documentation. Thanks a lot.

tijyojwad commented 6 months ago

Hi @pclavell - thanks for raising the issue.

it does not reach 100% sensitivity

What kind of sensitivity are you seeing?

The DNA read splitting code is here. Dorado searches for an adapter that's close to what looks like the start of a new read. It may be possible to integrate information about additional adapters, but so far we haven't found the need.

What model is your data called with?

tijyojwad commented 6 months ago

Hi @pclavell any updates on this?

pclavell commented 6 months ago

It turns out the information I had was based on an analysis of data basecalled with Guppy having >15% chimeric reads (detected thanks to internal PCR primers). However, I wonder if you have run any test to check the efficiency of Dorado ReadSplitter by leveraging internal adapters. Because now I am sequencing new data with internal adapters that I could use to further improve the splitting. Sorry for the delay.

pclavell commented 4 months ago

I finally finished my analysis and I get around 2% concatamers/chimeric reads, some of them being formed by more than 2 reads.

tijyojwad commented 4 months ago

Thank you for the analysis and feedback! We will look into how to improve read splitting. It would be great if you can share a few of the unspilt reads in a pod5 so we can evaluate any improvements.

pclavell commented 3 months ago

I can not at the moment because it is data generated with an unpublished protocol but it is basically the cDNA with one 56 nt long adapter different in each end. Then ONT library is prepared with that material.