ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
117 stars 14 forks source link

Tajima's D #38

Open vlrieg opened 3 years ago

vlrieg commented 3 years ago

Hi Kieran,

Inspired by this previous feature request for Hudson's FST, I'd love to see Tajima's D implemented in Pixy so I can do all my summary stat calculations with the same (excellent) tool. Looks like scikit-allel has this worked out for windowed & single region calculations too: https://scikit-allel.readthedocs.io/en/stable/stats/diversity.html?#allel.tajima_d

Thanks for considering! Val

ksamuk commented 3 years ago

Hi Val! Thanks for the suggestion. I'd love to implement Tajima's D at some point. Consider it "on the list". The issue with the scikit-allel function is that it likely uses its native implementation of pi (which is a component of the numerator of D), which we know can be inaccurate in many cases (see pixy paper). So we'd probably have to come up with something on our own, and then validate it using theory/sims. This might be a good starter project for a bioinformatically inclined student, actually. I'll leave this issue open as a reminder 👍

stsmall commented 8 months ago

Hi @ksamuk, I implemented a version of Tajima's D that accounts for missing data. I tested it w/ the same scripts you used for the pixy publication, so hopefully a robust demonstration. The method uses the equations of Ferretti et al 2012 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3416018/). I have not integrated this into the pixy scripts, so no PR atm, if you think it is OK method, then I can work on this. Third_pass_tajimas_d.pdf

MilesLuca commented 3 months ago

Hi,

Was wondering if there had been any update to this? Any plans to include Tajima's D in future releases?

stsmall commented 3 months ago

Hi @MilesLuca I havent made an effort to integrate code into the pixy base code. I am waiting to hear from @ksamuk whether this code addition met the standard and was a desired addition.