Open jawa0 opened 1 year ago
@jawa0 I'm the maintainer of LiteLLM we allow you to create a Router to maximize throughput by load balancing
I'd love to get your feedback if this solves your issue
Here's the quick start
from litellm import Router
model_list = [{ # list of model deployments
"model_name": "gpt-3.5-turbo", # model alias
"litellm_params": { # params for litellm completion/embedding call
"model": "azure/chatgpt-v-2", # actual model name
"api_key": os.getenv("AZURE_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_API_BASE")
}
}, {
"model_name": "gpt-3.5-turbo",
"litellm_params": { # params for litellm completion/embedding call
"model": "azure/chatgpt-functioncalling",
"api_key": os.getenv("AZURE_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_API_BASE")
}
}, {
"model_name": "gpt-3.5-turbo",
"litellm_params": { # params for litellm completion/embedding call
"model": "gpt-3.5-turbo",
"api_key": os.getenv("OPENAI_API_KEY"),
}
}]
router = Router(model_list=model_list)
# openai.ChatCompletion.create replacement
response = await router.acompletion(model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response)
@ishaan-jaff Just seeing this now! I've been meaning to try LiteLLM for a while now. Today may be the day.
App shouldn't crash, it should log and report the error.
e.g.
raise self._make_status_error_from_response(err.response) from None openai.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4-1106-preview in organization org-VjjlNbk7U9a6VJMyCoA0LQt0 on tokens per min. Limit: 40000 / min. Please try again in 1ms. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}} Illegal instruction: 4