nhansen / SVanalyzer

Tools for the analysis of structural variation in genomes
http://svanalyzer.readthedocs.io/
Other
76 stars 14 forks source link

How are the SVs merged? #2

Closed jmonlong closed 5 years ago

jmonlong commented 5 years ago

This tool looks very useful. I was wondering how the SVs were merged once they are clustered. Or how the representative/unique SV was picked. I tried to read the code but I'm not very familiar with Perl.

It seems like the representative SV for a cluster is randomly selected from the largest subcluster of exactly matched variants. And if there are multiple such subclusters of the same maximal size, one is randomly selected?

That would make sense as it favors breakpoints that are supported by multiple calls. I just wanted some confirmation.

If that's the case, it could be worth adding this to the documentation. I couldn't find it in the recent GIAB preprint either and a few people might be interested.

eldariont commented 5 years ago

Hi Jean, it seems as if there was a clarification in 7f1bd0e0ce1a66d296139f384d2e3bc903defcda although I just discovered it today:

The program then reports clusters of variants, and prints a VCF file of "unique" variants, where the variant reported in the VCF record is a randomly-chosen representative from the largest cluster (or a randomly selected largest cluster, in the case of a tie among cluster sizes) of exactly matching variants.

It think that means you were right. Best David

nhansen commented 5 years ago

Thanks, David. You are correct. I have updated the documentation at https://svanalyzer.readthedocs.io/en/latest/svmerge.html to attempt to make this clearer.