Open ekirving opened 1 year ago
An even better solution would be to support a --ploidy-file
parameter, like bcftools
does, to specify sex-dependent ploidy levels, both across and within chromosomes.
Having now tried phasing the PAR and non-PAR regions of chrX separately, I can confirm that this approach does not work. I had assumed that the whatshap.vcf.PloidyError
was because the ploidy changed along the chromosome for male samples, however, whatshap
will throw this error if any of the samples in the input VCF have a different ploidy level. As such, there does not appear to be a way to trio phase chrX, as the father is haploid and the mother is diploid in the non-PAR regions. Is this expected behaviour, or am I missing something?
Hi, thanks for reporting. I haven’t thought this through, but would just forcing the genotype to 0/0 for the haploid regions be a workaround?
WhatsHap is indeed currently not capable of working with multiple ploidies in the same file. I haven’t worked on that part of the code in a long time, but I believe that changing the assumption that everything is diploid (for whatshap phase
) requires quite some work. Perhaps not so many code changes are required, but understanding the existing code and finding all the places where the assumption is made would be the biggest part. (I wish we had some funding for projects like this.)
--ploidy-file
sounds like a great feature, thanks for pointing that out.
Hi Marcel, thanks for your suggestion. That sounds like it might work for trio phasing a female child, but it would not work for a male child (because they don't inherit their chrX from their father). In practice, this workaround would be quite fiddly to implement, as my pedigree also contains some multigenerational families, and there would be loss of phase information for the mother if all male children were excluded.
Thanks for reporting and sorry I had missed this. I can add that we just encountered a use case where we'd also like whatshap haplotag
to handle the PARs properly. So I think working in this would be good, but I agree it's probably a bit more involved to do it right. CC @samarendra-pani
I have a callset containing human X chromosomes, in which the males have been called with a ploidy of 2 inside the pseudoautosomal regions (PAR1 and PAR2) and ploidy of 1 for the rest of chrX.
Calling
whatshap phase
on these samples results in the following error:Is there a way to turn off this error in
whatshap
, as the region specific ploidy is both correct and expected in chrX for males?My planned workaround is to extract the PAR regions into separate files, phase with
whatshap
, then merge the VCFs back together. But it would be much simpler and easier if I could set a flag to tellwhatshap
not to throw an error when encountering haploid genotypes.