pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.13k stars 832 forks source link

Return 429 instead of 503 when worker job queue is full #2764

Open alazareva opened 10 months ago

alazareva commented 10 months ago

🚀 The feature

Currently when using the rest api, the prediction endpoint returns a 503 when the number of concurrent requests is larger than a worker's job queue. It would be great if we could get a 429 so that we know the service is not available due to high request load.

Motivation, pitch

We'd like to be able to disambiguate errors caused by too many requests from other transient 503s (either on server or service mesh side). Having the server return 429 would allow the client to handle retries differently in the case of high load.

Alternatives

No response

Additional context

No response

BudhirajaChinmay commented 9 months ago

Hi, I would like to work on this feature!