gtalarico / pyairtable

Python Api Client for Airtable
https://pyairtable.readthedocs.io
MIT License
765 stars 138 forks source link

Rate limit API requests without forcing retries #313

Closed simsong closed 10 months ago

simsong commented 10 months ago

Does pyairtable have provisions for slowing the rate of API requests to one every 0.25 seconds, to avoid the rate limiting that happens if more than 5 API requests are sent in a second? I've reviewed the code and can't find this. This would be relatively straightforward to implement and it would address some of the complaints I've seen here.

mesozoic commented 10 months ago

It does! In https://github.com/gtalarico/pyairtable/pull/272 (part of the 2.0 release) we introduced a default retry strategy for 429 errors (QPS limits exceeded) that will retry up to five times with exponential backoff. This somewhat mirrors the behavior of the official Airtable JS library, though we'll give up after ~3 seconds (and it's not clear to me whether their implementation will ever stop retrying).

Are you running into any issues with the default retry strategy?

simsong commented 10 months ago

Thank you for the response! I have reviewed the pyairtable code and reviewed Airtable's statement on rate limiting.

Airtable states: "Airtable enforces a rate limit of 5 requests per second to ensure optimal user performance across all pricing tiers. If you exceed this rate, you will receive a 429 status code and must wait 30 seconds before subsequent requests will succeed."

This may not be what Airtable is actually doing, but what it says is that if you issue 6 requests in a second, you need to wait 30 seconds before the next request is successful.

The request method in the Api class implements fallback, but it does not appear to implement throughput limiting: https://github.com/gtalarico/pyairtable/blob/308c269058e57ec67fea713134b0b54abb9ddecb/pyairtable/api/api.py#L109-L166

This means that if 6 requests are issued in a second, it will start to retry with progressively longer timeouts until 30 seconds have elapsed, at which point it will continue.

I suggest that it would be more in line with what Airtable is specifying if this class implemented a singleton for LastAPIRequestTime which tracks globally when the last API request was sent to a given Base (which seems to be the unit of throttling). Because Airtable allows 5 request per second, the Api class should enforce a wait of at least 0.2 clock seconds between requests. This should be configurable, because Airtable also states elsewhere that upsert calls have more stringent rate limiting.

It may be that Airtable is not throwing clients into a penalty box for 30 seconds when the rate limit is exceeded, but this is what the documentation clearly states that it is doing.

What is your experience?

mesozoic commented 10 months ago

time.sleep is both problematic and insufficient. It can cause unnecessary slowdowns in applications that rely on asynchronous event loops or which perform their own time-intensive processing in between requests, and it also doesn't account for multiple processes (perhaps on different servers) accessing the same base at the same time. The QPS limit is per base, not per connection or API key, so there's really no way for the client library to predict whether a particular request will get a 429.

My experience has been that the current retry strategy can gracefully handle Airtable returning a 429 status code under load. We tested this using the code you can see in the linked PR, and found that (1) it was more reliable than the prior implementation, and (2) it led to an overall reduction in runtime.

I don't believe Airtable enforces the 30 second delay they refer to in their documentation. As far as I can tell, Airtable's official JavaScript client library imposes no such cooldown period when it encounters 429s.

I'm going to close this issue because I don't think there's a reproducible problem here. If you're encountering a real world problem with the current retry strategy, please file a bug with sample code that reproduces it, and we'll look more closely.

simsong commented 10 months ago

Thank you for the discussion. I think that it would be useful to add a summary of the above to the comments in the code; if you wish, I can write up a PR that does this. When I evaluated pyairtable for a project recently, I specifically rejected it from consideration because it did not implement throttling in line with the Airtable documentation. That's what led me to post the issue.