hall-lab / svtyper

Bayesian genotyper for structural variants
MIT License
125 stars 55 forks source link

Multi core functionality missing for multisample projects #92

Closed dvanderleest closed 6 years ago

dvanderleest commented 6 years ago

svtyper is an amazing tool that does what it is supposed to in most of the cases. However, unfortunately multicore functionality is missing for projects involving multiple samples.

ernfrid commented 6 years ago

We'd recommend running svtyper individually for each sample and then combining the results after the fact. See the svtools tutorial for an example.

You may also be interested in Brent Pederson's smoove which provides support for running many samples in parallel on the same machine (as well as improvements to the underlying SV calling with Lumpy via filtering).

dvanderleest commented 6 years ago

Thank you very much for the recommendations. I will certainly investigate them. Could you please elaborate a bit on why this is recommended, though?

Does svtyper have trouble discriminating calls from different samples?

ernfrid commented 6 years ago

My recollection (although now fuzzy) is that the existing multi-sample code simply runs across each sample in serial. There isn't any information sharing across samples and thus I don't think you'd gain much by running svtyper with multiple samples as an input.

The workflow we run regularly, and thus can recommend with some confidence, executes svtyper individually and combines the result. This is a more complex workflow though and may not be terribly convenient if you have 10s of samples rather than 100s to 1000s to 10,000s.

If you have a medium number of samples then smoove is a well-supported and convenient wrapper for parallelizing and combining svtyper results on a single machine.