ropensci / rentrez

talk with NCBI entrez using R
https://docs.ropensci.org/rentrez
Other
195 stars 38 forks source link

entrez_search returning "DISCONTINUED" genes #95

Closed bastiensadoul closed 7 years ago

bastiensadoul commented 7 years ago

I am searching for the gene ID using a description. Here the command lines: gene_search <- entrez_search(db="gene", term="\"proteasome subunit beta type-6\" AND 'Homo sapiens'[porgn:__txid9606]", retmax = 1, sort="relevance") geneId <- gene_search$ids

It is returning :

geneId [1] "147061"

That is different from what I get using the following equivalent link: https://www.ncbi.nlm.nih.gov/gene/?term=(%22proteasome+subunit+beta+type-6%22)+AND+%22Homo+sapiens%22%5Bporgn%3A__txid9606%5D

The reason is provided by NCBI, the geneId that entrez_search is giving me is a "Discontinued Item" in opposition to "Current items". If you click on "See also 2 discontinued or replaced items." in the link I provided, you will see that indeed the first results for my query in entrez_search has a geneId of 147061.

So, I was wonderning if there is a way to make my entrez_search on the gene database but using only "Current" items.

Thank you for your help!

Cheers

dwinter commented 7 years ago

Hi @bastiensadoul , thanks for filing this issue.

This looks like a case where you need to use the NCBI's "Filter" paramter. Unfortunately, you can't access the filter-able terms from the entrez API, but you can you the web advanced search to pick them from a drop down. In this case I think current only[filter] does the trick:

q <- "(proteasome subunit beta type-6) AND (Homo sapiens[ORGN])"
q_filter <- "(proteasome subunit beta type-6) AND (Homo sapiens[ORGN]) and (current only[Filter])"
all_hits <- entrez_search(db="gene", term=q)
filt_hits <- entrez_search(db="gene", term=q_filter)
all_hits$ids %in% filt_hits$ids
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE

So that removes the filter removes the final two IDs. And sure enough, one of those is your deprecated gene

tail(all_hits$ids,2)`
[1] "147061" "95505" 

Closing this issue, but will open another one to document how to find the Filter options :)