ropensci / rsnps

Wrapper to a number of SNP web APIs
https://docs.ropensci.org/rsnps
Other
52 stars 22 forks source link

Incorrect information on ncbi_snp_query #59

Closed mxw010 closed 3 years ago

mxw010 commented 6 years ago

I was investigating a particular SNP on chr6, and this is what I get

ncbi_snp_query("rs1610720")
ncbi_snp_query2("rs1610720")

What I got, on two Macs:

> ncbi_snp_query("rs1610720")
      Query Chromosome    Marker Class Gene Alleles Major Minor    MAF       BP AncestralAllele
1 rs1610720          6 rs1610720   snp HCG4     G/T     C     A 0.3848 29793285            <NA>
> ncbi_snp_query2("rs1610720")
<dbsnp>
   SNPs: summary, data
   Summary:
      query    marker     organism chromosome assembly alleles minor    maf bp
1 rs1610720 rs1610720 Homo sapiens         NA       NA     C/T     G 0.3848 NA

Note that alleles from the two queries are different. If you look up the SNP on dbsnp, the alleles are A/G. So, where is C/A coming from? And that begs the question of if there is anything else like this?

Thanks,

sckott commented 6 years ago

thanks for the report @cherrywang1006

can you share you sessionInfo() please

mxw010 commented 6 years ago

Hi,

I replicated this issue at home with a Windows 10.

ncbi_snp_query("rs1610720")
       Query Chromosome    Marker   Class Gene Alleles Major Minor    MAF        BP      AncestralAllele
1 rs1610720          6           rs1610720   snp HCG4     G/T       C      A   0.3848 29793285            <NA>

And this is my seesion info:

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rsnps_0.2.0

loaded via a namespace (and not attached):
 [1] httr_1.3.1     compiler_3.5.0 magrittr_1.5   plyr_1.8.4     R6_2.2.2       tools_3.5.0    curl_3.2      
 [8] Rcpp_0.12.17   stringi_1.1.7  stringr_1.3.1  XML_3.98-1.11 

And I found another SNP with the same problem on a different chr:

ncbi_snp_query("rs2233691")
     Query Chromosome    Marker Class    Gene Alleles Major Minor    MAF       BP AncestralAllele
1 rs2233691          1 rs2233691   snp PLA2G2E     C/T     C     T 0.3141 19923843     C,C,C,C,C,C

on dbsnp it's listed that C is the minor allele, backed up by 1000 Geomes, but it's listed as major with your package.

sckott commented 6 years ago

thanks @cherrywang1006 -

here's the raw data that comes from NCBI when we use ncbi_snp_query https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&mode=xml&id=rs1610720

and same for ncbi_snp_query2 https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&retmode=flt&rettype=flt&id=rs1610720

gene's/snps aren't really my area, so I'm not sure if I'm doing something wrong here, any thoughts?

sckott commented 5 years ago

@cherrywang1006 any thoughts?

complexgenome commented 5 years ago

Hi @sckott

For https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&mode=xml&id=rs1610720 fetching data from recent build (Update build="151" date="2018-01-23 20:15") would be great. buildId="151". The alleles (A/G) are consistent with this buildId. Also, we may want to limit on the "handle" (please see below for more on "handle").

For second link https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&retmode=flt&rettype=flt&id=rs1610720 fetching from handle, for example (handle="TOPMED"), and parsing chr_location_Allele1_Allele2 should be good.

mxw010 commented 5 years ago

Have you given https://api.ncbi.nlm.nih.gov/variation/v0/ a look? I think JSON format in itself is way more accessible and reliable than extracting from XML?

On Fri, Nov 9, 2018 at 3:12 PM Scott Chamberlain notifications@github.com wrote:

@cherrywang1006 https://github.com/cherrywang1006 any thoughts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/rsnps/issues/59#issuecomment-437481400, or mute the thread https://github.com/notifications/unsubscribe-auth/AfmX57FjfRsjf2aaZwDh3VX2QuvUEc_cks5uteGngaJpZM4Umq4A .

sckott commented 5 years ago

I haven't. I don't think I was aware of it. Will look at it

jooolia commented 3 years ago

Hi @mxw010 , There is a new version of {rsnps} on CRAN that fixes this issue. I will now close this issue, but thank you for bringing it our attention. Cheers, Julia