ropensci / rsnps

Wrapper to a number of SNP web APIs
https://docs.ropensci.org/rsnps
Other
52 stars 22 forks source link

Missing genotype for some SNPs in some users in openSNP #127

Closed nokimchen closed 3 years ago

nokimchen commented 3 years ago

Hi all, this question is not really about the package... i am new to SNP studies. So, kindly pardon me for my newbie questions..

I would like to identify the frequency of genotypes of vairous SNPs in a gene For example, when i try to get the genotype of 10 users, i get the genotypes for all the 10 users

library(rsnps)
genotypes('rs664517', userid='1,6,8,10,11,13,14,16,17,19', df=TRUE)
   snp_name snp_chromosome snp_position   user_name             user_id genotype_id genotype
1  rs664517              5    111952247         xxxxxxxxxxs       1           9       AA
2  rs664517              5    111952247         xxxxxxxxxx       6           5       AA
3  rs664517              5    111952247         xxxxxxxxxx       8           2       AA
4  rs664517              5    111952247         xxxxxxxxxx      10           3       AA
5  rs664517              5    111952247         xxxxxxxxxx      11         176       AA
6  rs664517              5    111952247         xxxxxxxxxx      13           4       AA
7  rs664517              5    111952247         xxxxxxxxxx      14           6       AA
8  rs664517              5    111952247         xxxxxxxxxx      16        3446       AA
9  rs664517              5    111952247         xxxxxxxxxx      17         143       AA
10 rs664517              5    111952247        xxxxxxxxxx      19           7       AG

but when i try to get the genotype of the same 10 users, but different SNP rs id, i get genotype of only 7 users

genotypes('rs11952607', userid='1,6,8,10,11,13,14,16,17,19', df=TRUE)

1 rs11952607              5    111846501       xxxxxxxxxx       1           9       GG
2 rs11952607              5    111846501       xxxxxxxxxx       6           5       GG
3 rs11952607              5    111846501       xxxxxxxxxx       8           2       GG
4 rs11952607              5    111846501       xxxxxxxxxx       10           3       GG
5 rs11952607              5    111846501       xxxxxxxxxx       11         176       GG
6 rs11952607              5    111846501       xxxxxxxxxx       13           4       AG
7 rs11952607              5    111846501       xxxxxxxxxx       19           7       AG

My question is that, why am i not getting the genotype info of the remaining 3 users for rs11952607? is it because rs11952607 SNP was not detected in their genome? so, it is safe to say that for rs11952607 there is GG genotype for 5 individuals and AG for 2 individuals and the remaining 3 individuals has no SNP in rs11952607 ? and also rs664517 is more common (found in all 10 individuals) while rs11952607 is less common (found only in 7 individuals) ? based on these 10 users

NOTE: xxxxxxxxxx represent the usernames

sinarueeger commented 3 years ago

Hi @nokimchen, It's probably like you say: not all users have information about the same genotypes. If you look at https://opensnp.org/genotypes, where the data comes from, you'll see that the data comes from different platforms (e.g. 23andme, ancestry), and therefore likely from different genotype chips. This means that we have no information for individuals for which the SNP was not genotyped and not imputed.

More generally, if you want to have more accurate information about allele frequency, I'd recommend using something like gnomAD https://gnomad.broadinstitute.org/variant/5-111182198-G-A?dataset=gnomad_r2_1, which summarizes data from thousands of individuals, and groups them by population.