ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
115 stars 14 forks source link

Best practice for estimating πN/πS in pixy #43

Closed tshalev closed 3 years ago

tshalev commented 3 years ago

When estimating the ratio of nonsynonymous to synonymous pi, I would normally estimate π separately for each SNP set containing nonsynonymous and synonymous SNPs and take the ratio. I would generally consider invariant SNPs to be synonymous (unless I'm mistaken), so how would one account for this when trying to estimate the ratio in pixy?

ksamuk commented 3 years ago

Hi Tal, I'll have to think about this more, but the synonymous/nonsynonymous distinction usually requires there to be some kind of polymorphism at the site to begin with. That is, you need at least two alleles at a particular site that result in different codons, which then code for the same amino acid or not. An invariant site would only have a single codon state, so the syn/nonsyn classification doesn't really make sense (to me). There is a related quantity, 4-fold degenerate π, where you are classifying sites as 4-fold degenerate, and for that you do want to include invariant sites, but that is not the same thing as πS. I'd confirm that the πS you are trying to calculate requires sites to be polymorphic or not before proceeding.

tshalev commented 3 years ago

Yes you're right; I suppose without any codon change the the SNP can't truly be synonymous. I will look into zero/four fold degenerate sites. Thanks!