Closed joshlawrimore closed 1 month ago
Tracing the issue through source code of metapub see https://github.com/metapub/metapub/blob/master/metapub/pubmedfetcher.py#L91
result = self.qs.efetch({'db': 'pubmed', 'id': pmid})
qs is defined by
self.qs = get_eutils_client(self._cache_path)
which is a method defined here https://github.com/metapub/metapub/blob/master/metapub/eutils_common.py#L15
which is simply constructing an eutils QueryService as defined here: https://github.com/biocommons/eutils/blob/main/src/biocommons/eutils/_internal/queryservice.py#L47
I am going to attempt a direct call to the PMIDs that are confirmed to actually exist on PubMed to eutils QueryService to try and recreate the erros
I have a vague recollection that pubmed can be very sensitive on many rapid requests from the same IP. You might consider trying to add or increase the pause between queries.
On Thu, Oct 10, 2024 at 9:30 AM joshlawrimore @.***> wrote:
Tracing the issue through source code of metapub see https://github.com/metapub/metapub/blob/master/metapub/pubmedfetcher.py#L91
result = self.qs.efetch({'db': 'pubmed', 'id': pmid})
qs is defined by
self.qs = get_eutils_client(self._cache_path)
which is a method defined here https://github.com/metapub/metapub/blob/master/metapub/eutils_common.py#L15
which is simply constructing an eutils QueryService as defined here: https://github.com/biocommons/eutils/blob/main/src/biocommons/eutils/_internal/queryservice.py#L47
I am going to attempt a direct call to the PMIDs that are confirmed to actually exist on PubMed to eutils QueryService to try and recreate the erros
— Reply to this email directly, view it on GitHub https://github.com/nimh-dsst/sharestats-leo-notfork/issues/5#issuecomment-2405101961, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4BEWMUEBXLTXPOU2YQ5MLZ2Z6Q5AVCNFSM6AAAAABPWXERMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBVGEYDCOJWGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>
According to https://www.ncbi.nlm.nih.gov/books/NBK25497/:
API Keys
Since December 1, 2018, NCBI has provided API keys that offer enhanced levels of supported access to the E-utilities. Without an API key, any site (IP address) posting more than 3 requests per second to the E-utilities will receive an error message. By including an API key, a site can post up to 10 requests per second by default. Higher rates are available by request ([vog.hin.mln.ibcn@seitilitue](mailto:dev@null)). Users can obtain an API key now from the Settings page of their NCBI account (to create an account, visit [http://www.ncbi.nlm.nih.gov/account/](https://www.ncbi.nlm.nih.gov/account/)). After creating the key, users should include it in each E-utility request by assigning it to the api_key parameter.
Example request including an API key:
esummary.fcgi?db=pubmed&id=123456&api_key=ABCDE12345
Example error message if rates are exceeded:
{"error":"API rate limit exceeded","count":"11"}
Only one API key is allowed per NCBI account; however, a user may request a new key at any time. Such a request will invalidate any existing API key associated with that NCBI account.
I wasn't getting the "API rate limit exceeded" error, buuut... when I retried some of the PMIDs that threw the error the first using eutils directly, I got a result. I'll try putting in a sleep(1) and seeing what happens
After playing with delaying the API call in the PMIDs for loop, I found very little difference in the number of eutils.EutilsNCBIError
errors. Instead a limited retry strategy worked well with most PMIDs needed only two or three retries to succeed. Updated code was pushed in 1b51da97e7496b98dc1688001e7d34f664d8bdda
The recently scraped PMIDs are causing several hundred errors when calling
see https://github.com/nimh-dsst/sharestats-leo-notfork/blob/dev_2024/scripts/get_pmids_articles.py#L31
Vast majority of errors look like this in the
get_pmids_articles_error.txt
:Some look like this:
or this