ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
115 stars 14 forks source link

Overestimation of number differences? #88

Closed lisagrigoreva closed 10 months ago

lisagrigoreva commented 11 months ago

Hi, I was trying to play with files and reproduce the logic from the paper. I calculated Pi for a 10pb window per site I have 3 individuals 0/0 0/0 1/1 The total number of differences is 8 and the number of comparisons is 15 (6 to 2 combinations). The only way that you can get 8 is (2 differences 2 alleles) + (2 differences 2 alleles)=8 However, in the VCF file the number of alleles is already taken into account. Could you please clarify, why Pixy calculates 8, not 4?

Thank you!

ksamuk commented 11 months ago

Hi There,

In order to look into this, I will need a reproducible example (VCF + pixy command) of the bug.

Cheers,

Kieran