Open prashantmital opened 4 years ago
Note that we poll the Atlas API for many purposes in astrolabe
. The most common endpoint that is polled is https://docs.atlas.mongodb.com/reference/api/clusters-get-one/ and we use the output to determine cluster status (specifically, we glean provisioning status, maintenance status etc using clusterState
).
Atlas API resources are rate-limited on a per-project basis. Since each and every evergreen build of this project uses the same Atlas project, it is possible to run into API rate limits when multiple builds are running simultaneously.
In the absence of a backoff/retry logic, hitting the rate limit results in the entire test run failing with a message like:
We should improve astrolabe to account for this failure mode and appropriately wait/backoff when such errors are encountered.