nhansen / SVanalyzer

Tools for the analysis of structural variation in genomes
http://svanalyzer.readthedocs.io/
Other
76 stars 14 forks source link

SVmerge batches #12

Closed malmarri closed 3 years ago

malmarri commented 3 years ago

Hi,

Thanks for the useful tool.

I am trying to merge ~500 VCFs of SVs identified by SVrefine using SVmerge, but calculating the distance file is taking some time (over a week so far). Is it possible to split the process into smaller batches (e.g. run SVmerge on 50 samples at a time) and then run SVmerge again on the 10 previously merged VCFs to save time?

Thanks.

nhansen commented 3 years ago

Two thoughts: the first is that if you're not already, I would split by chromosomes or even regions first and run on multiple cpus if you have that option. Reducing the "maxdist" option can help a lot too, although you might miss merging some similar SVs in long repetitive regions.

Splitting by samples should work, except that the annotation of how many SVs are in each cluster and what the max distance is between those SVs will be incorrect.

Finally, if you've run for a week already and know that your distance file contains all SVs for a given chromosome, you can use that distance file (so don't delete it!) as an option in a new run of SVmerge, which will prevent it from doing all that aligning again.

Let me know how it goes, and if there's anything else I can do to be of help.

malmarri commented 3 years ago

Thank you for the suggestions!