ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
115 stars 14 forks source link

Add support for arbitrary ploidy level (at least warn about incompatibility with polyploids) #97

Open taprs opened 7 months ago

taprs commented 7 months ago

Thank you for this tool! The idea is pretty elegant and we have been needing the "official" script to do these simple stats for so long...

I understand that this was likely addressed in #79 but I want it to be said explicitly : we would like to see arbitrary ploidy level support! From our test runs it seems that pixy silently takes first two alleles in polyploids and thus dramatically lowers the pi estimate for polyploids.

It would be good, as a first quick fix, to add a warning (or maybe even an error?) if any cells of the input VCF have ploidy other than 2. Then I have a dream of being able to use pixy with arbitrary ploidy level, including cases when different samples or genomic positions have different ploidy levels...

Best wishes, Nikita

taprs commented 3 weeks ago

I drafted a commit that outputs correct pi and dxy values for arbitrary ploidy levels provided that the maximum number of alleles per site is given as --ploidy argument in my fork: https://github.com/taprs/pixy

So far it messes the n_missing counts, but I can improve it further if it can be merged into pixy later. I do not want to maintain my own fork 😈