must handle rate limits from open ai

hookla commented 1 year ago

openai.error.RateLimitError: Rate limit reached for gpt-4 in organization org-lDdTak03uNZ02kmY5m6ginja on tokens per min. Limit: 10000 / min. Please try again in 6ms. Contact us through our help center at help.openai.com if you continue to have issues.

so wait and retry when we see this...

lynxrv21 commented 1 year ago

Added retry logic with decorator in gpt_client Although, need to decrease delay to 0.1 second - should be enough for 10k/min rate.

Alternatively, I can add 0.1-second delay before every request - but that would be less elegant Or set up a timer and counter for requests, reset the counter every second - that feels like an overshoot

ishaan-jaff commented 12 months ago

@lynxrv21 @hookla I'm the maintainer of LiteLLM - I believe we can help with this problem - I'd love your feedback if LiteLLM is missing something

Here's the quick start: docs: https://docs.litellm.ai/docs/routing

from litellm import Router

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "gpt-3.5-turbo", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]

router = Router(model_list=model_list)

# openai.ChatCompletion.create replacement
response = await router.acompletion(model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": "Hey, how's it going?"}])

print(response)

hookla / DreamTeamGPT

must handle rate limits from open ai #9