Open halasadi opened 8 years ago
Presumingly, STRUCTURE works similar to ADMIXTURE. The STRUCTURE manual says:
The program ignores missing genotype data when updating Q and P. This approach is correct when the probability of having missing data at a particular locus is independent of what allele the individual has there. While estimates of Q for individuals with missing data are less accurate, there is no particular reason to exclude such individuals from the analysis, unless they have very little data at all.
I think what we'll have to do is either
(1) impute missing genotypes using HAPMAP or 1000 genomes. (2) Remove individuals with high missingness.
Filtering using the plink command --geno 0.1 results in 8% missingness and 315199 (of the original 354212) remain. Using --geno 0.5 results in only 7,500 SNPs.
So, under the new less missing data, we need to re-do our analysis structure analysis with pooled and modern only (for K=3,5,7,10) and re-do the Scree plots.
We want to know if the ancients still drive the variation in the data after this.
(Edit: Also check missingness per individual)