Vultr API Rate limiting problem

jmatsushita commented 8 years ago

Hi there,

Thanks for the docker machine driver!

I've ran into rate limitation problem as soon as I try to provision more than one machine and maybe some of it can be controlled from Rancher (which I'm using for controlling docker machine and I've posted a feature request about that rancher/rancher#4464) but I've also contacted the Vultr support to ask about their rate limiting policy and I thought you might find their answer useful.

Our recommendations here are to cache the responses of API calls (for example, our plans list rarely changes), and not use your API key unless a method requires authentication.

Unfortunately, we've seen some very poorly coded API clients before, which hammered our API with tons of requests per minute. Due to issues like that, whitelisting hosts is not possible.

We do have customers successfully managing hundreds of servers via the API, so this is definitely something that is doable.

I did take a quick look at that, and it seems to poll every 2 seconds while a server is pending. This is likely the source of the issue you're seeing.

Hope the advice can contribute to manage the rate limit in order to be able to provision several machines simultaneously!

Thanks,

Jun

janeczku commented 8 years ago

@jmatsushita This is a known issue. Unfortunately Docker Machine's fan-out plugin architecture makes it impossible for the driver to control the global rate of API calls. Operational errors caused by exceeding API rate limits therefore have to be addressed in Docker Machine first and foremost.

It's unfortunate that Vultr does not see the need for action on their part. In fact, they should change their API rate-limit mechanism to some sane per-minute averaged model instead of imposing a ridiculous 2 req/second limit that get's already tripped by synchronously sending 4 API calls in a row.

That being said, the good news is that i am already working on implementing a retry mechanism to deal with rate-exceeded API errors.

vincent99 commented 8 years ago

Not a very good answer from them... Checking on a resource that is in a transitioning state every 2 seconds is not an unreasonable use-case. If the queries are that expensive they should have caching on their API layer, not pass their problem down to the clients.

Limiting individual expensive or cacheable calls is one thing, but limiting the entire API to 2 requests/second, particularly with what sounds like a very high-resolution measurement ("2 requests in this second" vs an average like "120 request in the last 60 seconds") is frankly ridiculous.. They're just making it difficult for people to make request to give them money.

(Source: Worked on public APIs for a much larger hosting company in a previous life)

Anyway, I changed the Rancher UI to create one machine at a time, which will probably be good enough to fix it. https://github.com/rancher/ui/pull/605

Also added an option to for createDelayMs that custom driver UIs can use, though someone would have to make one for Vultr to make use of this (see https://github.com/rancher/ui-driver-skel).

janeczku commented 8 years ago

@vincent 😄

janeczku commented 8 years ago

Anyway, I changed the Rancher UI to create one machine at a time, which will probably be good enough to fix it. rancher/ui#605 Also added an option to for createDelayMs that custom driver UIs can use, though someone would have to make one for Vultr to make use of this (see https://github.com/rancher/ui-driver-skel).

Awesome! I think the one machine at a time change will probably be enough to solve this. If not i will take a look at the driver UI skeleton and make one for Vultr.

janeczku commented 8 years ago

Release v1.0.7 implements retry logic for API rate-limit errors.

janeczku / docker-machine-vultr

Vultr API Rate limiting problem #12