ialbert / bio

Making bioinformatics fun again
MIT License
60 stars 13 forks source link

unable to fetch all the documents from the api #24

Closed jmohit13 closed 1 year ago

jmohit13 commented 2 years ago

Hi, I am using Bio 1.3.7 version to retrieve documents from the pubmed db. I observed a mismatch in the number of search results from the Pubmed search and Bio api.

Entrez.email = "test@gmail.com"
Entrez.api_key = <API_KEY>
handle = Entrez.esearch(db=DB, term=QUERY, rettype="medline")
record = Entrez.read(handle)

count = int(record['Count'])
handle = Entrez.esearch(db=DB, term=QUERY, retmax=count, rettype="medline")
record = Entrez.read(handle)

id_list = record["IdList"]

query = ((cSCC) OR (Cutaneous squamous cell carcinoma)) AND ((relapse) OR (relapse rate) OR (treatment progression))

No. of results from Bio api = 1552 No. of results from Pubmed search = 1749

For a few other queries, I observed this difference to be quite large. Can you please look into this. Thanks.

ialbert commented 2 years ago

This issue occasionally pops up on Biostar

note that the problem comes from NCBI not bio or entrez direct in general. The NCBI website returns a different number of results on wether you connect from command line or via the web.