nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
527 stars 63 forks source link

low number of duplex reads #599

Closed maxstengl closed 6 months ago

maxstengl commented 9 months ago

When using dorado for duplex basecalling without providing a pair id file the duplex rate is below 1% when provided with a pair id file created with duplex tools the duplex rate is above 30%. I am using the default settings for dorado and duplex tools and the sup model. Our data consist of rather short (70 bases) dna fragments. Is using duplex tools still the way to go or are there parameters that can be changed to achieve better results using just dorado. Thank you.

vellamike commented 9 months ago

Duplex currently doesn't work well for such short reads, duplex tools shouldn't be used as it is misreporting the number of pairs in this scenario. We are working on ways to improve duplex for short reads but at the moment it only works for reads >1kb

maxstengl commented 9 months ago

Thank you for your reply. Using dorado duplex basecalling (pairs created with duplex tools) gives a significant bump to our q-scores (mean up from around simplex:12 to duplex:17). As an intermediate solution, we are discarding the numbers given by duplex tools and use those given by dorado only. Or is there a better alternative? Thank you.

vellamike commented 9 months ago

Simplex going from Q12 to Q17 is indicative of a problem as these are very low Q scores to begin with, Duplex is generally in the ~Q30 range. I'm also a bit surprised that Duplex is working at all with read lengths which are so short. In general I wouldn't use duplex at all here, my worry would be misidentification of pairs.