Closed timmocking closed 4 years ago
We haven't done extensive testing on how UNCALLED performs in the presence of variants, but I do not think small indels would reduce the mapping rate very much. The algorithm is fairly permissive of slight shifts in read/reference positions because the signal is so noisy. Even if one seed (~10-12bp) fails to map because of an indel, seeds after the indel would map correctly and would probably be clustered with TP seeds before the indel.
Larger structural variants may be more of an issue, and we are looking into this. Some SVs might be impossible to fully handle with a standard linear reference, such as large insertions or translocation. In these cases you can rely on the flanking sequence to map correctly, which would hopefully provide enough coverage over the SV, depending on the SV size.
On a side note, one thing you should definitely NOT do currently is include multiple copies of the same locus with different variants. This may introduce many exact repeats to the reference, which would disrupt the indexing process. Again, we're currently working on improving this and hope to have an update focusing on variants soon.
Thanks! I will keep that in mind.
Hi! I found the following in your preprint:
"Due to the noisy nature of nanopore sequencing, UNCALLED must use very loose thresholds for event/k-mer matches, which produce many false positive seed mappings. We eliminate these false positives under the observation that they will usually map to random locations, while true positives will map to locations consistent with their position on the read."
How will this affect the use of UNCALLED for the detection of indels and other structural variants, considering that these mappings are inconsistent with their position on the read?