ropensci / rentrez

talk with NCBI entrez using R
https://docs.ropensci.org/rentrez
Other
194 stars 38 forks source link

API rate limiting for concurrent processes #161

Open epruesse opened 3 years ago

epruesse commented 3 years ago

156 was not completely fixed.

The static timing works as long as there is only one R process per IP or per API key. If multiple users use rentrez without API key with the same external IP, or multiple jobs with the same script and API key are run e.g. on a cluster, the timing can easily fail.

NCBI sends x-ratelimit-remaining and x-ratelimit-limit headers with each response. When the rate limit is exceeded, response code 429 is sent along with a retry-after header.

A simple, robust fix might be to handle the 429 by honoring the retry-after header and resending the request after a sleep of the indicated number of seconds. I don't know if it's always 2, of if NCBI increases the wait time dynamically.

A more complete solution could try to calculate the wait time depending on the x-ratelimit-remaining and x-ratelimit-limit values dynamically in addition to that. This can get complex easily, though, if you want to guess accurately when the next request would be likely OK to send.

dwinter commented 3 years ago

Thanks @epruesse, last time I looked at this NCBI did not send the rate limit headers so we were a bit stuck on how to deal with multiple processes/multithreading.

Will take this up for the next release.