simonkurtz-MSFT / python-openai-loadbalancer

Smart Python OpenAI Load Balancer using priority endpoints and request retries. | Python package at link below:
https://pypi.org/project/openai-priority-loadbalancer
MIT License
11 stars 0 forks source link

Use lowest retryAfter when no backends are available #2

Closed simonkurtz-MSFT closed 6 months ago

simonkurtz-MSFT commented 6 months ago

We keep track of the LastRetryAfter value in load_balanced.py. When no backends are available, and the last attempted backend possibly has a high retryAfter value, we would wait more than we need to as other backends may have become available again.

What I believe we should do then is return the nearest upcoming retry_after datetime property in the backends collection. The spec allows for either a datetime or seconds. I expect the httpx client to honor both.

simonkurtz-MSFT commented 6 months ago

Fixed