nf-core / exoseq

Please consider using/contributing to https://github.com/nf-core/sarek
http://nf-co.re
MIT License
16 stars 28 forks source link

Parallelization for Haplotypecaller #23

Closed veeravalli closed 6 years ago

veeravalli commented 6 years ago

In order to speed up Haplotypecaller step, parallelization can be introduced per chromosome etc.. and then merge the gvcfs. This will be particularly useful to reduce the time to analyze WGS samples.

marchoeppner commented 6 years ago

Hi, WGS will not be part of ExoSeq at this time as we decided that certain design elements (especially for maximum parallelization) would likely be different. The Sarek pipeline currently offers robust WGS calling, so ExoSeq will (likely) do what it says on the box, exomes.

However, we will look into speed improvements as suggested. We are currently passing the exome kit target as interval list to HC; I am not sure that splitting this file into chunks would really result in vast speed improvements (since the targets are usually pretty small anyway), but it is worth checking.

ewels commented 6 years ago

Sarek: https://github.com/SciLifeLab/Sarek

apeltzer commented 6 years ago

Speedups for Exome Capture Kits are negligible, so this is not worth the effort. To even gain speed, we'd need to merge intervals together in chunks to work on as otherwise the I/O required to query each interval eats up the benefits. Closing this therefore!