Best way to check for breaches of rate limit using the Assistants API?

openai / openai-dotnet

The official .NET library for the OpenAI API

https://www.nuget.org/packages/OpenAI

MIT License

707 stars 60 forks source link

Best way to check for breaches of rate limit using the Assistants API? #66

Open Jaffacakes82 opened 1 week ago

Jaffacakes82 commented 1 week ago

Hi,

What is the recommended approach to leveraging the Assistants API via this SDK and appropriately handling breaches of the TPM rate limits?

I'm using RAG with both GPT-3.5-turbo and GPT-4o models, and given context tokens count towards the rate limit, I'm hitting these semi-frequently. How should I handle this?

Thanks!

KrzysztofCwalina commented 1 week ago

This is a good issue. Thanks for bringing it to our attention. We will think about possibly offering a built-in solution for this, but as a workaround I wonder if you could not handle this using a custom policy:

Create a subclass of PipelinePolicy that handles the error you get when you emceed the rate limit, in which case the policy would throttle (i.e. Task.Delay)
create an instance of OpenAIClientOptions and call AddPolicy(yourCustomPolicy, PipelinePosition.PerTry) on it.
Inject the policy into the AssitantClient by passing the instance of [OpenAIClientOptions to the client's constructor. The

trrwilson commented 1 week ago

To add to this: if you'd like to examine the values of the documented rate limit response headers, you can also do that without a custom policy by retrieving the raw response from the formal response wrapper and then checking its header values:

ClientResult<ThreadRun> runResult = client.CreateRun("assistantId", "threadId");
if (runResult.GetRawResponse().Headers.TryGetValue("x-ratelimit-limit-tokens", out string remainingTokenText))
{
    // remainingTokenText has a value like: "150000"
}
ResultCollection<StreamingUpdate> streamingUpdates = client.CreateRunStreaming("threadId", "assistantId");
if (streamingUpdates.GetRawResponse().Headers.TryGetValue("x-ratelimit-reset-tokens", out string resetTimeText))
{
    // resetTime has a value like: "6m0s"
}

As @KrzysztofCwalina mentioned, we'll look into providing a more direct and typed mechanism to retrieve this; ideally, when the keys are well-known, you shouldn't need to provide them explicitly like this.

Jaffacakes82 commented 1 week ago

Thanks @KrzysztofCwalina @trrwilson. I've implemented a basic solution using exponential backoff for now.

It might make sense to make this available in the Usage property of the ThreadRun.

KrzysztofCwalina commented 1 week ago

@Jaffacakes82, be aware that the clients already implement a retry logic (with exponential backoff). The retries happen on any error, and apparently they delay is not enough for your scenarios. But, because of the retry logic , it's very important that when you add your custom policy to the client, you add it at "PerTry" position. Otherwise, the built in retry logic will kick in first and the client will still be retrying too early.

trrwilson commented 1 week ago

@KrzysztofCwalina, I believe this is the related System.ClientModel issue: https://github.com/Azure/azure-sdk-for-net/issues/44222

Without the built-in DelayStrategy, I think retries are -- without the custom policy -- happening immediately, irrespective of hints like retry-after headers.