Add rate limiting/backoff based on SDK usage to the sync endpoints.

mobiusml / aana_sdk

Aana SDK is a powerful framework for building AI enabled multimodal applications.

https://www.mobiuslabs.com/

Apache License 2.0

12 stars 1 forks source link

Add rate limiting/backoff based on SDK usage to the sync endpoints. #84

Closed ashwinnair14 closed 2 months ago

ashwinnair14 commented 4 months ago

Enhancement Description

Overview of the enhancement When the SDK reaches its usage limit and cannot serve requests synchronously anymore, it should return a backoff message to the user.

Advantages

Benefits of implementing this enhancement
1. When the SDK is shipped for users to test, they receive feedback on usage and know to wait before sending more requests.
2. The UI can give feedback to the user on the usage instead of the experience breaking.

How is this solved normally by other projects? Add links

evanderiel commented 3 months ago

Review of four possibilities I found for access limiting with Ray Serve:

Config only: ~~1. Throttling: using declared deployment/machine resources.~~ Doesn't work, only affects the number of possible deployments, not how they handle load. ~~2. Throttling: setting target_num_ongoing_concurrent_requests.~~ Doesn't work, limits concurrent executions, but excess is queued instead of returning a 429. Code solutions: ~~3. Rate limiting: add a decorator to the deployment inference function that implements rate limiting.~~ Doesn't work. The rate limited calls still wait for tge earlier, non-rate limited ones to complete before erroring.

Rate limiting: Custom RequestHandler that implements e.g. leaky bucket algorithm. Will work, but probably the most work.

movchan74 commented 3 months ago

How do we decide when we have to refuse requests?

evanderiel commented 3 months ago

Ideal scenario would be to decide based on runtime characteristics (something like given X models, Y GPUs, and Z expected execution time, we limit to Y/X requests per Z time) or even adjust rate limits while running, but for now we'll just use manually configured values.

movchan74 commented 2 months ago

@evanderiel This is done, right? Can we close the issue?

evanderiel commented 2 months ago

More could always be done, but yes