ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
115 stars 14 forks source link

ability to handle sex chromosomes #79

Open jtweir opened 1 year ago

jtweir commented 1 year ago

Is your feature request related to a problem? Please describe. The problem is that pixy seems unable to handle sex chromosomes. For example, the X chromosome is homogametic and thus haploid in males. VCF files in which genotypes have been properly called and coded for the X will have haploid genotype calls for males and diploid calls for females. Such VCF inputs are not currently recognized by pixy. Likewise, the Y chromosome cannot be handled because it is haploid.

Describe the solution you'd like The ability to calculate Fst, pi, and dxy for sex chromosomes containing a mixture of males and females. Currently the only work around I am aware of is to incorrectly calculate diploid genotypes for both males and females on the X and males on the Y which then results in incorrectly calculated Fst, pi, and dxy (though probably not biased in any way).

ksamuk commented 1 year ago

Hi @jtweir, the correct handling of sex chromosomes is in progress and will be addressed in the next update (along with general support for non-diploid data). I actually need this for my own work, so it will be a high priority for me. As always, the rate-limiting step will be validating the new feature with simulations. I will reference this issue in the release when it is added.

aersoares81 commented 7 months ago

Hi @jtweir, the correct handling of sex chromosomes is in progress and will be addressed in the next update (along with general support for non-diploid data). I actually need this for my own work, so it will be a high priority for me. As always, the rate-limiting step will be validating the new feature with simulations. I will reference this issue in the release when it is added.

I just wanted to say this will be great for people working in haplodiploid species (like bees) that have haploid and diploid individuals in the same population. There's no other software that cares about them…