ropensci / rentrez

talk with NCBI entrez using R
https://docs.ropensci.org/rentrez
Other
194 stars 38 forks source link

How to use efetch with the new query format? #155

Closed anleopa closed 3 years ago

anleopa commented 3 years ago

Since the query format to access some data in eutils has changed, how could this query be accomplished using the efetch function?

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=clinvar&rettype=vcv&is_variationid&id=14206,41472&from_esearch=true

The current usage as rentrez::entrez_fetch(db="clinvar", id=c(14206,41472), rettype="vcv") doesn't work beacuse the results returned does not match the list of ids specified in the id argument. How can the options is_variationid and from_esearch=true be added to the query?

dwinter commented 3 years ago

Hi @anleopa, I can look into this tomorrow. In the meantime, do you have a link or some documentation about NCBI's change in the query format?

anleopa commented 3 years ago

Thanks! Here you can find information about these queries: https://www.ncbi.nlm.nih.gov/clinvar/docs/maintenance_use/#api https://academic.oup.com/nar/article/48/D1/D835/5645007

dwinter commented 3 years ago

Hi @anleopa , I think adding the extra arguments to the efetch function works.

can you check that the second commant gives teh expected output?

Alt IDs with default params:

library(XML)
default_raw <- entrez_fetch(db="clinvar", id=c(14206,41472), rettype="vcv")
default <- xmlTreeParse(default_raw, useInternalNodes=TRUE
<XRefList>
  <XRef ID="P06213#VAR_004093" DB="UniProtKB"/>
  <XRef Type="Allelic variant" ID="147670.0001" DB="OMIM"/>
  <XRef Type="rs" ID="121913135" DB="dbSNP"/>
</XRefList> 

[[2]]
<XRefList>
  <XRef DB="OMIM" ID="147670.0001" Type="Allelic variant"/>
</XRefList> 

[[3]]
<XRefList>
  <XRef ID="nssv580395" DB="dbVar"/>
  <XRef ID="nsv530689" DB="dbVar"/>
  <XRef ID="nssv580394" DB="dbVar"/>
</XRefList> 

[[4]]
<XRefList>
  <XRef DB="dbVar" ID="nssv580394" Type="dbVarVariantCallId"/>
  <XRef DB="dbVar" ID="nssv580395" Type="dbVarVariantCallId"/>
  <XRef DB="dbVar" ID="nsv530689" Type="dbVarVariantRegionId"/>
</XRefList> 

And with the other args added, you get different records:

with_extra_raw  <-  entrez_fetch(db="clinvar", id=c(14206,41472), rettype="vcv",
                                                     is_variationid=TRUE, 
                                                     from_esearch=TRUE)
with_extra <- xmlTreeParse( with_extra_raw , useInternalNodes=TRUE)
[[1]]
<XRefList>
  <XRef Type="Allelic variant" ID="158105.0002" DB="OMIM"/>
</XRefList> 

[[2]]
<XRefList>
  <XRef DB="OMIM" ID="158105.0002" Type="Allelic variant"/>
</XRefList> 

[[3]]
<XRefList>
  <XRef ID="CA130857" DB="ClinGen"/>
  <XRef ID="O15455#VAR_021976" DB="UniProtKB"/>
  <XRef Type="Allelic variant" ID="603029.0002" DB="OMIM"/>
  <XRef Type="rs" ID="3775291" DB="dbSNP"/>
</XRefList> 

[[4]]
<XRefList>
  <XRef DB="OMIM" ID="603029.0002" Type="Allelic variant"/>
</XRefList> 

attr(,"class")
[1] "XMLNodeSet"
anleopa commented 3 years ago

Hi @dwinter ,

Yes, adding the extra arguments work. Thanks!!

dwinter commented 3 years ago

great!