About downsampling and incomplete SFS

paula-tataru / polyDFE

predicting DFE and alpha from polymorphism data

GNU General Public License v3.0

28 stars 0 forks source link

Adding zeros to the SFS will bias the results, yes. You have to down-sample the data as described in the tutorial:

Alternatively, projection methods can be used to down-sample the SNP data to build a complete SFS with a reduced number of samples [9, 10].

Nielsen R, Bustamante C, Clark AG, Glanowski S, Sackton TB, Hubisz MJ, Fledel-Alon A, TanenbaumDM, Civello D, White TJ, Sninsky JJ, Adams MD, Cargill M (2005) A scan for positively selectedgenes in the genomes of humans and chimpanzees. PLoS Biology 3(6):e170.

James JE, Piganeau G, Eyre-Walker A (2016) The rate of adaptive evolution in animal mitochondria.Molecular Ecology 25(1):67–78.

There doesn't seem to be a full consensus on how to do this “properly“ besides those two references. The so called hypergeometric projection is the standard approach.

You might also find some useful pointers on how to do this using py here: https://speciationgenomics.github.io/easysfs/

paula-tataru / polyDFE

About downsampling and incomplete SFS #5