Starting a fleet with 30+ instances does not go well

YouGina commented 2 years ago

Online there are people using many instances at once. When I try to start 30+ instances of a fleet at the same time I quickly run into issues with digital ocean's API. Is there something I can do about it? Maybe throttle the requests or some thing?

To solve this problem I now start my full fleet by using axiom-fleet three times, which is not so user-friendly. Since there are people running hundreds of instances I assume there must be a way.

Output I get is something like this:

Error: GET https://api.digitalocean.com/v2/account/keys?page=1&per_page=200: 429 (request "c80a60ce-f7fe-4749-ab49-df90ba087799") Too many requests                                                                                                                                     
Error: GET https://api.digitalocean.com/v2/account/keys?page=1&per_page=200: 429 (request "d3d71d28-4c20-4903-84ec-dfb47fe55416") Too many requests
jq: error (at <stdin>:1): Cannot index array with string "name"
jq: error (at <stdin>:1): Cannot index array with string "name"
Error: GET https://api.digitalocean.com/v2/account/keys?page=1&per_page=200: 429 (request "552fb80c-5d2b-4272-9e8d-3296f7abac4a") Too many requests
Error: GET https://api.digitalocean.com/v2/account/keys?page=1&per_page=200: 429 (request "8a7ca25f-dbc2-4c0e-8e97-9a42292f421e") Too many requests                                                                                                                                     jq: error (at <stdin>:1): Cannot index array with string "name"
jq: error (at <stdin>:1): Cannot index array with string "name"
An image has not been found in this region. Do you need to run 'axiom-build'?
jq: error (at <stdin>:5884): Cannot iterate over null (null)

If I start say 50 instances at once, the result is a lot of errors like the above and only 30 to 40 issues started when finished.

Hope anyone can help me solve this problem

0xtavian commented 2 years ago

Spin up in batches of 10-15 to avoid api rate limits wrt droplet creation :)

YouGina commented 2 years ago

That's what I'm doing now. Is that how people who have 1000 instances spin up do it too? Can't imagine what a pain that must be

0xtavian commented 2 years ago

The issue is, every cloud provider has their own rate limits. So it’s not really a one size fits all for spinning up in batches. we could add logic to spin up X instances at a time for Y provider, just haven’t gotten around to doing that yet. What I do is for i in $(seq 1 55); do axiom-fleet myfleet -i 9 -r [comma separated list of regions; done. I know others have made small spinup scripts to avoid this problem. Some examples are somewhere in the issues. @YouGina

YouGina commented 2 years ago

I have been able to adjust axiom-fleet a bit to make it work for digital ocean. Not sure if that will work for linode or other providers too. It creates a fleet of 50 droplets in about 180 seconds now. Still gives errors while it is running, but it works.

fail-open commented 2 years ago

I adjusted the sleep per init call in axiom-fleet to 4 seconds (~3 seems to be the rate limit) and it let me spin up 50 without flagging. DO appears to have 1 hour rolling quotas of 5k requests which would make it inconsistent to try to spot an issue if you are going in and out of that quota.

You can see the limit/remaining requests in the response headers if you do a direct call to to the DO api.

@0xtavian A middle ground of figuring out each providers rate and adding logic to handle in the code could be to take in a flag for the wait per init call in axiom-fleet so that it is easier to fine tune for the users without modifying the code like I did or having to do a loop to call the script multiple times.

0xtavian commented 2 years ago

@fail-open That sounds like it should work! We could simply set the sleep time based on the provider. https://github.com/pry0cc/axiom/blob/2a808087ae4232748b917c628c6180a0a7a3fd9f/interact/axiom-fleet#L43

pry0cc / axiom

Starting a fleet with 30+ instances does not go well #566