halasadi / ancient-structure

w/ Kushal
0 stars 0 forks source link

Re-do analysis with fewer missingness #27

Open halasadi opened 8 years ago

halasadi commented 8 years ago

Filtering using the plink command --geno 0.1 results in 8% missingness and 315199 (of the original 354212) remain. Using --geno 0.5 results in only 7,500 SNPs.

So, under the new less missing data, we need to re-do our analysis structure analysis with pooled and modern only (for K=3,5,7,10) and re-do the Scree plots.

We want to know if the ancients still drive the variation in the data after this.

(Edit: Also check missingness per individual)

halasadi commented 8 years ago

Presumingly, STRUCTURE works similar to ADMIXTURE. The STRUCTURE manual says:

The program ignores missing genotype data when updating Q and P. This approach is correct when the probability of having missing data at a particular locus is independent of what allele the individual has there. While estimates of Q for individuals with missing data are less accurate, there is no particular reason to exclude such individuals from the analysis, unless they have very little data at all.

halasadi commented 8 years ago

I think what we'll have to do is either

(1) impute missing genotypes using HAPMAP or 1000 genomes. (2) Remove individuals with high missingness.