marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
644 stars 177 forks source link

Will Canu read correction reduce true variation for strain-typing ONT reads? #2297

Closed weishwu closed 3 months ago

weishwu commented 3 months ago

I have a set of ~400bp amplicon ONT reads that were sequenced from a mixture of 700 closely related strains of one bacterial species. I'd like to run some error correction to reduce the errors in the raw reads, but wonder if this may also reduce the true variation and impact the detection limit. The reference sequences of these strains have high similarity and only differ at some SNVs, and the lowest abundance can be 1% or even lower. Thanks.

skoren commented 3 months ago

The correction doesn't try to preserve haplotypes and will definitely mix some of the similar sequences together. There are thresholds to try to avoid piling reads over expected coverage but that won't help the rare alleles. It will depend on the similarity between the strains, if it's over about 2% or there are large SVs they should be separated. Below that would likely be collapsed/mixed. In general, I am not aware of any correction that would not damage the LOD, especially on the lower abundance stuff that you're interested in.

weishwu commented 3 months ago

@skoren Thanks!