Open charliechen912ilovbash opened 1 year ago
Hello @charliechen912ilovbash,
Sorry for replying so late. It is well known that the repeat sequence would disturb the alignment and report low-accurate breakpoints on the read. SV callers collect the breakpoints on each read to infer SV candidates. There is no doubt that treating the low-accurate breakpoints as SV signatures would produce low-quality SV positions. To overcome this, cuteSV clusters all breakpoint signatures in a relatively small region to generate "consensus" SV breakpoint groups, then divides them into possible SV events through length signatures. After that, report final SV calls and corresponding genotypes. For more details please read our paper here. I hope this is helpful to you.
Best regards, Tao
Hi, Tao
But for the assembly-based SVs calling, did cuteSV
still cluster breakpoints? Since it is only one read in the sam, could it be possible for cuteSV
to report these breakpoints?
Hello @baozg,
Thanks for pointing this out. Actually, cuteSV achieves assembly-based SV calling by converting the typical SV callsets to diploid-based SV callsets. That is, cuteSV generated the initial SV callsets first, which applied the cluster approach mentioned above (there is still more than one SV signature somewhere even though only one contig for a haplotype). Then cuteSV resolves the haplotype tags for each SV call to give phasing-genotype.
Tao
Hi, Tao
But for an inbreeding plant or haploid cell lines in humans, like A.thaliana or CHM13. It only have one haplotype, did this also need a clustering step.
Besides, as you mentioned, if I want to call variations with cuteSV with population-level assemblies, it would be better to use all the assemblies in one alignment file for this clustering step to refine the breakpoints, right?
Zhigui
Hi, I'm wondering if there exist repeat sequence (e.g. simple repeat) on the SV (e.g. deletion) breakpoint, will it affect the accuracy of SV position? or how does CuteSV v1.0.12 overcome this issue.