Use exponential backoff and handle 429 status codes

AaronFriel commented 1 year ago

The OpenAI backend may rate limit clients by API key or organization, returning a 429 status code.

Currently this bubbles up as an error in the CLI.

krrishdholakia commented 11 months ago

hey @AaronFriel, I'm the maintainer of LiteLLM we allow you to create a Router to maximize throughput by load balancing + queuing (beta).

I'd love to get your feedback if this solves your issue

Here's the quick start

from litellm import Router

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "gpt-3.5-turbo", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]

router = Router(model_list=model_list)

# openai.ChatCompletion.create replacement
response = await router.acompletion(model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": "Hey, how's it going?"}])

print(response)

AaronFriel commented 11 months ago

Hey @krrishdholakia! Appreciate the suggestion here, our backend is written in Node so I don't think we can use your solution as is, but great work on a useful product!

pulumi / pulumi-ai

Use exponential backoff and handle 429 status codes #58