tjiangHIT / cuteSV

Long read based human genomic structural variation detection with cuteSV
MIT License
244 stars 36 forks source link

For multiple samples, why should cuteSV be run twice? #150

Open jahemker opened 1 month ago

jahemker commented 1 month ago

Hi, I am planning on calling SVs in multiple samples to get a population set. I saw in #143 that it is recommended to first run cutesv on each sample, merge into a population set, then re-genotype each sample with the population set. Can any insight be provided on why re-genotyping is recommended as opposed to just taking the first merged set?

tjiangHIT commented 1 month ago

Hello @jahemker,

Thank you for using cuteSV.

Re-genotyping is recommended in the workflow of joint SV calling for several key reasons:

1) As the population-scale SV callsets evolve with increasing sample sizes, re-genotyping enables the accurate determination or refinement of original SV zygosities. This is important because initial analyses may have been affected by issues such as insufficient coverage, which could lead to inaccurate SV zygosity determination.

2) After merging, the breakpoint or size of the corresponding SV might differ from the original one reported by the first-round of cuteSV. Re-genotyping in the second round of cuteSV allows for the determination of SV characteristics based on the new SV targets in each individual.

These steps ensure that the SV callset remains accurate and reflective of the most up-to-date data.

Hope this helps you!

Best regards,

Tao