simonkurtz-MSFT / python-openai-loadbalancer

Smart Python OpenAI Load Balancer using priority endpoints and request retries. | Python package at link below:
https://pypi.org/project/openai-priority-loadbalancer
MIT License
11 stars 0 forks source link

Add Support for Multiple Models #35

Open simonkurtz-MSFT opened 5 months ago

simonkurtz-MSFT commented 5 months ago

Presently, the backends are model agnostic. That means that every model being used by the implementer of this code must reside on every Azure OpenAI instance that is defined in the backend. This could be limiting because it would require a lowest common denominator. Take these backends, for example:

Today, the backend pool can only use backends 2 and 5.

If the backend list could take model into consideration, the following would apply per model:

I am interested to hear whether there is value in being able to specify backends per model or whether this is a potential solution in search of a problem.