Rate Limiting Not Working

jamesgoodhouse commented 4 years ago

We are currently making use of the terraform-provider-ns1 to manage our users. The provider is configured to either set the RateLimitFunc to RateLimitStrategyConcurrent or RateLimitStrategySleep (see https://github.com/terraform-providers/terraform-provider-ns1/blob/master/ns1/config.go#L63-L67) for the ns1-go lib. Unfortunately, during large refreshes of users we are still receiving 429 Rate limit exceeded errors. After closer inspection of the code, I may not understand how rate limiting is implemented, but I don't believe it is working as expected.

When looking at the Do method, I see that the request is first executed, and then the RateLimitFunc is executed. This seems backwards. If the client is configured to utilize the RateLimitStrategySleep, the request will only sleep after it has been executed, making it essentially useless. I was thinking that maybe it was done this way because a retry would occur once the sleep happened, but I'm not seeing anywhere in the code where a retry might occur.

Apologies if I'm completely mis-reading the code, but ultimately, we're still having errors occur due to rate limits which I feel should be working correctly with this lib.

rupa commented 4 years ago

Hi, thanks for reporting this.

There is definitely room for improvement with the rate limiting strategies, and we have plans. The "sleep after request" behavior is not ideal, but in practice it is still useful. Because of the way rate-limiting is implemented in the API, we really can't know for sure if we are going to be rate-limited until we've made the request, so the "strategies" are necessarily about "avoidance" and could certainly hit edge cases. And we can't know about all other activity that may be hitting the API and consuming requests.

For terraform, the "concurrent" strategy is probably best, with a "parallelism" number roughly matching the parallelism used for terraform. The strategy does try to "saturate" the rate-limiter, it doesn't leave any room for anything else.

To start, I'd like to ask what you are setting parallelism to (to make sure it's in the ballpark of what we're expecting). Assuming it's about what we'd expect, upping the parallelism number a bit will cause workers to wait a little longer, and leave a little buffer for other activity to not cause 429s. If the parallelism is too high, workers will wait longer than they have to after requests, but on the next request, we burst through any "extras" without waiting, so overall, it won't be that much slower.

If we rule out a bit more of a "buffer" being all that's needed, we can discuss next steps.

ThiagoYU commented 3 years ago

Hello @jamesgoodhouse, is this issue solved? If not, your parallelism is set to how much? These errors continued after following the advice in the documentation? Link here

eravin-ns1 commented 1 year ago

Closing due to inactivity. @jamesgoodhouse if you are still experiencing problems not addressed by the doc linked in the above comment, please re-open or file a ticket with NS1 support.

ns1 / ns1-go

Rate Limiting Not Working #99