ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
117 stars 14 forks source link

LD pruning before calculating Fst/dxy? #49

Closed nitinra closed 3 years ago

nitinra commented 3 years ago

Hello,

I am planning to calculate Fst using pixy. I wanted to ask if you would recommend LD pruning before calculating Fst. I understand from the paper and the docs that it isn't best to LD prune for calculating pi diversity.

Thank you! Nitin

ksamuk commented 3 years ago

Hi Nitin!

It really depends on what you are doing with the FST estimates. If you are just plotting FST across the genome, and not doing any formal inference then generally no pruning is necessary. If you are looking for FST outliers by comparing individual values of FST vs. the genome-wide distribution of FST, you'll want to remove regions with very high LD e.g. segregating inversions at first. Otherwise, I think the current thinking is to use pruning sparingly, as it might introduce biases (e.g. undersampling regions of low recombination).

Hope that helps,

Kieran

nitinra commented 3 years ago

Thanks a lot Kieran! This helps tremendously. I am planning to do the latter to identify outlier Fst!

Regards, Nitin