Open JWDebler opened 2 months ago
Please see here for similar issues under discussion https://github.com/nanoporetech/dorado/issues/968 https://github.com/nanoporetech/dorado/issues/962
Does the correction algorithm discard coverage anomalies like this?
Dorado Correct does not explicitly discard high-coverage reads, but keep in mind that Minimap2 (used under the hood for overlapping) does have a frequency filter for kmers. Are your entire datasets of 2000x coverage, or just the mitochondria?
Only the mitochondria. I aim for about 50x coverage of the genome, but since there are many copies of the mitogenome per cell I often get crazy high coverage for them.
What's the expected length of your mitogenome, and what does your read length distribution look like?
The mitogenome is 55 kb. Here are the nanoplot read distributions for the 'raw' simplex reads before correction for all 19 isolates, the ones one the left contained mitogenome reads after correction, the ones on the right lost them all.
The only thing I can see is that the losers do have a very high 'short' read peak, but that is also present in some of the 'maintainers'.
Hi,
we recently sequenced a batch of 19 fungal isolates. I tried to assemble the genomes a few different ways (simplex + duplex or simplex corrected only) in order to figure out if there is still a need to do the longer duplex calling pipeline.
Corrected simplex reads turned out to give good assemblies, for most samples that is. In my particular case 7 of those 19 ended up with no mitochondrial reads after correction with dorado.
Mapping the uncorrected simplex reads onto the assemby leads to a crazy coverage for the mitogenome of over 2000x, but 0x for the corrected set generated from those raw reads. Does the correction algorithm discard coverage anomalies like this? All 19 assemblies have mitogenome reads in the duplex set and the uncorrected simplex reads, but for 7 the correction step throws all of them out.
Cheers.