stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy-docs.vercel.app/
MIT License
16.68k stars 1.29k forks source link

OpenAI timeout #207

Open andreapiso opened 10 months ago

andreapiso commented 10 months ago

OpenAI APIs are quite unstable recently and time out often. Training a prompt requires quite a number of calls which means 99% you will experience a timeout. The current max_time is hard-coded as part of the Backoff decorator and set to 1000 seconds. It would be good to be able to change this value so that the compilation would not take forever.

raquette commented 10 months ago

Same problem

darinkishore commented 10 months ago

@raquette @andreapiso provide stack trace?

andreapiso commented 10 months ago

@darinkishore what stack trace do you need? The program does not crash, it hangs because openAI does not response until it prints that it's retrying with exponential backoff because api.openai.com did not respond.

raquette commented 10 months ago
image

@andreapiso I tested with gpt-3.5-turbo-1106 (new version), it is much better.

okhat commented 10 months ago

Ah, I think this issue is (in the short term) less severe now?

andreapiso commented 10 months ago

I tested it today with 1106 and the OpenAI API still gets stuck fairly often, which makes dspy take a almost two hours to complete a generation of 8 candidate programs with 24 examples.

andreapiso commented 10 months ago

In particular, I am starting to get:

Error for example in dev set:        HTTP code 502 from API (<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>cloudflare</center>
</body>
</html>
)

I recognise the issue is on the OpenAI side but not being able to change the behaviour and the timeout from dspy makes it quite a bit worse.

dasjyotishka commented 9 months ago

It is taking me more than two hours to train a chain-of-thought based question-answer predictor having 400 pairs, and often I am facing timeout problems. It would be good to resolve this issue.

okhat commented 9 months ago

Can you help me understand the issue? I run large experiments with OpenAI (the latest turbo model at all times) and, except when they note on https://status.openai.com/ that there are issues, it's always very fast and stable.

I understand sometimes they're unreliable. In that case, I don't see how DSPy can help though.

max_time is hard-coded as part of the Backoff decorator and set to 1000 seconds

We can definitely set this to less than 1000, but how can that help? Won't this just timeout on the next call still?

Basically if OpenAI is inherently unstable at a given hour, how can the library deal with that? The main thing that comes to mind is make sure you set the number of threads to be low enough depending on your rate limit.

Happy to address this if I'm missing some context @dasjyotishka @andreapiso @raquette @darinkishore

okhat commented 9 months ago

This may explain the issue from @dasjyotishka ?

image

andreapiso commented 9 months ago

Can you help me understand the issue? I run large experiments with OpenAI (the latest turbo model at all times) and, except when they note on https://status.openai.com/ that there are issues, it's always very fast and stable.

I understand sometimes they're unreliable. In that case, I don't see how DSPy can help though.

max_time is hard-coded as part of the Backoff decorator and set to 1000 seconds

We can definitely set this to less than 1000, but how can that help? Won't this just timeout on the next call still?

Basically if OpenAI is inherently unstable at a given hour, how can the library deal with that? The main thing that comes to mind is make sure you set the number of threads to be low enough depending on your rate limit.

Happy to address this if I'm missing some context @dasjyotishka @andreapiso @raquette @darinkishore

Not sure if the experience is different depending on the geographic region you are in (I am in South East Asia), but my experience with OpenAI API has been very different, and it consistently drops about 10% of requests (not an exaggeration) especially when submitting multiple requests in parallel (still being very far from the rate cap). The requests being dropped happen at seemingly random so re-submitting the request most of the time works. This is has been experienced by multiple users, on different wifi networks using different machines (and different clients outside of DSPy, we get the same timeouts when calling OpenAI from javascript). It has happened consistently for several weeks. The responses to the OpenAI forum tickets created for these kinds of issues are "when you make an API requests, you should always assume it won't work" (lol).

So, when optimising a prompt in dspy with something like random search, every 10-15 requests effectively dspy gets stuck for 15 minutes, unless I am there to stop the notebook manually and re-run it since the previous iterations of the optimisations get cached.

okhat commented 9 months ago

Thank you. It sounds like you’re sending parallel requests, without having increased the default rate limit.

I’m 70% sure you’re facing these issues consistently just because of sending parallel requests and running out of tokens per minute. Please use a single thread and let me know if the issues persist consistently. That’s comes across as hopefully unlikely.

andreapiso commented 9 months ago

It happens consistently with single thread too, which is actually the worst case scenario, because at least with multiple requests in parallel other requests in the optimisation set can stil proceed while one gets stuck, instead the 15 minute break paralyses the whole process every time.

okhat commented 9 months ago

Okay, a couple more questions. Does this happen with the latest gpt-3.5-turbo (at whatever time you faced the issue)? It’s possible OpenAI downgrades older models.

Does your code include generation of many completions at once (like n=10) or something like that?


Basically, with a single thread, during the “15 minute” gap, the code is retrying the openai call with exponential backoff. If every repeated call keeps failing, that is completely outside our control. I think during 15 minutes, it basically has to retry around 10 times and fail every time. Is that true? In that case, what behavior do you expect from the library?

okhat commented 9 months ago

You may benefit from requesting a rate limit increase by the way. The default rate limit (tokens per minute) is absurdly small.

andreapiso commented 9 months ago

I am definitely nowhere near hitting the rate limit, this happens even just sending 20-30 calls over the span of the optimisation process.

Basically, with a single thread, during the “15 minute” gap, the code is retrying the openai call with exponential backoff. If every repeated call keeps failing, that is completely outside our control. I think during 15 minutes, it basically has to retry around 10 times and fail every time. Is that true? In that case, what behavior do you expect from the library?

The point is that if this was happening, I would see the "backoff retry" message being printed out right? Instead, nothing gets printed out for an absurd amount of time. The expectation would be to allow users to pass the timeout parameter for dspy to set accordingly? So that if I know my calls should last time, I could set a timeout of <x + buffer>.

I think failing 10 times in 15 mins with backoff would happen for a 60 seconds timeout, but are you sure the timeout is 60 seconds, and not 10 minutes? (e.g. see here: https://github.com/openai/openai-python/issues/762 ) the experience right now is that every time a call is dropped everything in DSPy hangs for 10+ minutes with no backoff or anything else being printed.

okhat commented 9 months ago

The default rate limit can be triggered by as little as 20 calls. Isn’t it something like 90k tokens per minute?

That said, I understand now your concern. It’s not directly related the 15 minute gap. You basically need a timeout for each individual request, not a smaller timeout for the whole backoff logic.

I think that’s reasonable. We can make sure each individual request times out in 40 seconds (though I would have preferred that OpenAI drops the request properly on their end rather than an arbitrary small timeout). We still want to keep the 1000 second logic as-is. Does this sound right to you?

krrishdholakia commented 9 months ago

Hey @okhat @andreapiso, I'm the maintainer of LiteLLM we allow you to create a openai-compatible proxy to maximize throughput by load balancing + queuing (beta).

Would this help solve your issue?

Docs: https://docs.litellm.ai/docs/routing

okhat commented 9 months ago

@krrishdholakia Sounds awesome! I'd merge this if you wanna open a PR!

vikrantrathore commented 2 months ago

Facing the same issue on self hosted LLM running gemma-2-9B-Instruct with fastchat and sglang worker, which works when token is set to 512 but timeout when max_token is 2000.

When using openai api SDK directly with the prompt generated by checking inspect_history it works fine. Something wrong with OpenAI gpt3.py in dspy.

Error is same as above 502 Bad Gateway.

Machine is 128GB memory with 24GB RTX 4090 for inference.