Closed marcelm closed 5 years ago
Original comment by Tobias Marschall (Bitbucket: tobiasmarschall, GitHub: tobiasmarschall).
We also usually parallelize by chromosome. This is something that we definitely should teach WhatsHap to do natively. In fact there's Issue #31 on this, sitting there since 2014. It would be easy to add an option to only output the requested chromosome. On the other hand, one can just pipe the output of WhatsHap directly into bcftools view
to subset the variants to a chromosome, so I would prioritize working on a parallelized WhatsHap.
The basic principle at the moment is that WhatsHap will take the input VCF, augment it with phasing information where it can and output it otherwise unchanged. I don’t think we should change this, and I also don’t think we should add an option to add the requested feature.
I agree we should prioritize issue #31 (parallelization) instead.
Original comment by Fong Chun Chan (Bitbucket: [Fong Chun Chan](https://bitbucket.org/Fong Chun Chan), ).
Thanks both. This makes sense. Looking forward to the resolution of the parallelization issue.
Original report by Fong Chun Chan (Bitbucket: [Fong Chun Chan](https://bitbucket.org/Fong Chun Chan), ).
When using the
--chromosome
argument,whatshap phase
will still return the other variants on other chromosomes. Is there a way to have whatshap only return the positions it interrogated as specified by the--chromosome
. For instance, if I had the followinginput.vcf
file:Running:
Produces this phased VCF
It only phases chromosome 1, but still returns the variants from chromosome 2. The reason I am interested in this is so that I can parallelize across chromosomes. Concatenating the results afterwards will be a lot easier if I knew that each run didn't contain all the original results.
Thanks in advance for your help.