Closed dmboyd closed 3 years ago
thanks for the issue.
not done this in python before. this gives a clue https://gist.github.com/rsperl/085679536bc991e919d628be4fe8e838#max-retries - the HTTPAdapter
class in requests - but may need to dip into urllib3.Retry for more control
it'd be nice if there was an out of the box solution for this in a package on pypi or so, know of anything?
I've submitted a fairly simple PR to pause on API depletion using the out of the box method in requests.
Longer term, perhaps it makes sense to wrap the ncbi methods within a class for request.session reuse/throttling, and/or utilize the http:// post api pattern described within link to utilize the library to bank uids for larger queries (which exceed http url length limits when placed within params).
@kmeiklej perhaps a relatively easy area to target. Translating sample entrez code from perl to python should be straight forward.
thanks, having a look at the PR.
one option is biopython, which has an entrez module https://biopython.org/docs/1.75/api/Bio.Entrez.html Seems to handle rate limiting out of the box. Though when i installed it it immediately threw some curl errors, so that doesn't give me hope it would be something to depend on.
Biopython entrez are certainly hitting that api in a more efficient way (dynamically utilising post vs get). But the model implementation (in Perl) has some api changes that don’t appear to be back ported to biopython, and you’re right; requests is a better approach than using curl/urllib. Closing this specific issue, but will leave for @kmeiklej to raise an issue to address.
Issue:
I'm getting http errors when running multiple entrez queries within a loop. This is caused by exceeding the entrez api limits.
The http responses for ncbi/entrez programming provide the helpful http response: 'X-RateLimit-Remaining' with a count of how many api responses are remaining within the applicable rate limit.
It'd be nice if the library respected the API limits automatically.
Potential solution:
When a response shows 'X-RateLimit-Remaining' <= 1 , wait for ~1 second before returning the http request to allow the api limit to reset.