Open zhbery opened 1 year ago
Interesting. Do you know how Immersive Translation does the load balancing? I clicked the linked you provided but didn't find things related to multiple API keys
@logancyang @zhbery
I'm the maintainer of LiteLLM we provide an Open source proxy for load balancing Azure + OpenAI It can process (500+ requests/second)
(i'd love feedback if you're trying to do this)
Doc: https://docs.litellm.ai/docs/simple_proxy#load-balancing---multiple-instances-of-1-model
model_list:
- model_name: gpt-4
litellm_params:
model: azure/chatgpt-v-2
api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
api_version: "2023-05-15"
api_key:
- model_name: gpt-4
litellm_params:
model: azure/gpt-4
api_key:
api_base: https://openai-gpt-4-test-v-2.openai.azure.com/
- model_name: gpt-4
litellm_params:
model: azure/gpt-4
api_key:
api_base: https://openai-gpt-4-test-v-2.openai.azure.com/
litellm --config /path/to/config.yaml
curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}
'
Great work!
I hope to support multiple API keys so that the load can be balanced and response capability can be improved. For example, multiple API keys can be set separated by commas. Other tools like Immersive Translate now also support this feature.
Open AI (Azure OpenAI) | Immersive Translation