Streaming chunk generator: incomplete JSON

mmiguel6288 commented 10 months ago

Describe the bug

Screenshot_20231011_185746_Termux

When iterating through chunks in a chat completion stream response, the generator is crashing due to an incomplete JSON expression:

File "/data/data/com.termux/files/usr/lib/python3.11/site-packages/openai/api_requestor.py", line 765, in _interpret_response_line data = json.loads(rbody) ^^^^^^^^^^^^^^^^^

(Pdb) p rbody '"{\"rate_limit_usage\": {\'

It looks like it should potentially collect more responses and concatenate them before trying to do a json.loads decode. Or alternatively the server should provide the entire JSON object in the same response. This issue appears to have come into existence today as I had no issues yesterday.

To Reproduce

import openai, os openai.api_key = os.environ['OPENAI_API_KEY']

params = {'model': 'gpt-4', 'messages': [{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'tell me three arbitrary words'}], 'stream': True}

response = openai.ChatCompletion.create(**params)

for chunk in response: print(chunk)

Code snippets

import openai, os
openai.api_key = os.environ['OPENAI_API_KEY']

params = {'model': 'gpt-4', 'messages': [{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'tell me three arbitrary words'}], 'stream': True}

response = openai.ChatCompletion.create(**params)

for chunk in response:
    print(chunk)

OS

android/termux

Python version

python 3.11.5

Library version

openai 0.28.1

asavoy commented 10 months ago

This error only reliably reproduces for me with the gpt-4-0314 model. So the updated snippet to reproduce is:

import openai, os
openai.api_key = os.environ['OPENAI_API_KEY']

params = {'model': 'gpt-4-0314', 'messages': [{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'tell me three arbitrary words'}], 'stream': True}

response = openai.ChatCompletion.create(**params)

for chunk in response:
    print(chunk)

# APIError: HTTP code 200 from API ("{\"rate_limit_usage\": {\)

rttgnck commented 10 months ago

I am seeing this error using gpt-3.5-turbo. Been going fine until tonight.

I tried a Postman direct request, and did not see the rate limit, and I've verified my inputs are the same.

I have also not gotten the error on a prompt response only to then get it on the next prompt response.

Same results, ChatCompletion.create{"gpt-3.5-turbo, stream=true, messages}, you get the idea.

athyuttamre commented 10 months ago

Hi all, we're aware of this issue and have reverted a change that might be causing this. Apologies for the errors here as we continue to investigate the root cause.

trex55 commented 10 months ago

We are still seeing the same errors. but somehow it gets resolved if we switch accounts. Do you have an ETA on when this will get fixed?

rttgnck commented 10 months ago

@athyuttamre Could you share what you reverted so we may do it locally while we wait for the fix. It's breaking at least my dev process. Edit: I just read the other thread, and the Telegram bot thread and may have misunderstood. I haven't tested again since last night yet.

@trex55 What do you mean switch accounts? Get a different API key from a different OpenAI account?

athyuttamre commented 10 months ago

@trex55 @rttgnck are you still seeing this error? We reverted a change when I posted the comment above. If you are logging the headers, any chance you could share request IDs from the X-Request-ID header value?

rttgnck commented 10 months ago

@athyuttamre That was me misunderstanding that there it was an issue at OpenAI. It seemed like that would have been the case as I didn't update the api. It seems to be working now that I have had a chance to test it.

trex55 commented 10 months ago

thanks @athyuttamre @rttgnck in general we see significant latency (up to 10 seconds) for one account irregardless of how many API keys I create. But when I use another openai account the latency goes down to 1 second. So it appears openai has limited my account for some reason even though I am no where near the requests per minute.

The rate limit error does seem to have disappeared but not the significant latency as per that one account. This is alarming as this was suppose to be our prod account for our chatbot but now I have had to postpone the release as I would like some answers as to why this type of thing can happen.

trex55 commented 10 months ago

@athyuttamre, is it possible if we were throughly testing our customer service chatbot for prompt attacks and other guardrails that openai may have mistakenly limited the account in terms of latency??

athyuttamre commented 10 months ago

Glad to hear this issue was resolved.

@trex55 I can't answer account-specific questions here. Could you email me at atty@openai.com? Thanks!

gdagitrep commented 10 months ago

Was seeing this issue for much longer even after it was resolved. Just a simple redeploy of the application worked fine 🤷 .

@athyuttamre Is there a way to log X-Request-ID header value while using OpenAI/Langchain package?

dlaliberte commented 9 months ago

FYI, I was having this problem as well. It turned out the requested URL was incorrect for the server I was using (Koboldcpp, which recently added openai support, but the required URL turned out to be "http://localhost:5001/v1"). The consequence of the incorrect URL was a 404 from the server with empty content, but this response was not handled by the openai library very well, resulting in a crash of the whole process.

openai / openai-python