Open Morsey187 opened 11 months ago
We'd need to investigate if we can catch those in the AI backend implementation.
It looks like those would need to be implemented in https://github.com/simonw/llm directly and then we could catch the "llm" package's exceptions, or if there's an HTTP response returned, we could use the status code to figure that out.
We can't guarantee that our local environment will have all the optional dependencies installed.
Another way might be something like this which is still not ideal, but a good trade-off if the user experience matters.
def get_rate_limitting_exceptions() -> Generator[Exception, None, None]:
try:
import openai
except ImportError:
pass
else:
yield openai.RateLimitError
try:
import another_package
except ImportError:
pass
else:
yield another_package.RateLimitException
def handle(prompt, context):
try:
backend.prompt_with_context(prompt, context)
except Exception as e:
rate_limit_exception_classes = tuple(get_rate_limitting_exceptions())
if rate_limit_exception_classes and isinstance(e, rate_limit_exception_classes):
raise WagtailAiRateLimitError from e
raise
@Morsey187 @tm-kn
I'm the maintainer of LiteLLM we provide an Open source proxy for load balancing Azure + OpenAI + Any LiteLLM supported LLM It can process (500+ requests/second)
From this thread it looks like you're trying to handle rate limits + load balance between OpenAI instance - I hope our solution makes it easier for you. (i'd love feedback if you're trying to do this)
Doc: https://docs.litellm.ai/docs/simple_proxy#load-balancing---multiple-instances-of-1-model
model_list:
- model_name: gpt-4
litellm_params:
model: azure/chatgpt-v-2
api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
api_version: "2023-05-15"
api_key:
- model_name: gpt-4
litellm_params:
model: azure/gpt-4
api_key:
api_base: https://openai-gpt-4-test-v-2.openai.azure.com/
- model_name: gpt-4
litellm_params:
model: azure/gpt-4
api_key:
api_base: https://openai-gpt-4-test-v-2.openai.azure.com/
litellm --config /path/to/config.yaml
curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}
'
Add support for raising a custom Wagtail AI rate limit exceptions.
I'm not aware of any existing support for rate limiting within wagtail and unsure what library would be preferable to use here, so I can't suggest an approach, however, I'd imagine we'd want to support limiting not only requests but also tokens per a user account. Allowing developers to configure the package so that individual editors activity doesn't effect one another i.e. editor 1 reaching the usage limit for the whole organisation account, thus preventing editor 2 from using AI tools.