Open lucasjinreal opened 1 month ago
Which model/backed are you using? Can you also share your client request code?
the code like this:
response = client.chat.completions.create(
model="gpt-3.5-turbo-16k",
messages=[{"role": "user", "content": [
{'type': 'text', 'text': '描述一下图片内容'},
{'type': 'image_url', 'image_url': {'url': encode_image_to_base64('data/images/gt_10036.jpg')}},
]}],
temperature=0.95,
stream=True,
)
print_response(response)
it's a typical openai request.
Am used internvl2-8b model.
The phenomenon has always been reproducible stably.
For streamed replies you need to iterate on the response to get the streamed data, check the chat_with_image.py
for a more detailed example, but the response should be used like this, if you want to print it:
for chunk in response:
print(chunk.choices[0].delta.content, end='', flush=True)
Oh, I think I miss leading you, the question is not my client, my client works well with anyother API such as openai etc.
And my print_response
is like:
def print_response(response):
if isinstance(response, openai.Stream):
for chunk in response:
if (
hasattr(chunk.choices[0].delta, "content")
and chunk.choices[0].finish_reason != "stop"
):
print(chunk.choices[0].delta.content, end="", flush=True)
print()
else:
print(response.choices[0].message.content)
I don't know if is the server issue.
I've just tested this for sanity sake, and OpenGVLab/InternVL2-8B streams just fine with version 0.28.1 of openedai-vision.
Can you test with chat_with_image.py
? It would be:
python chat_with_image.py data/images/gt_10036.jpg "描述一下图片内容"
I wonder if the problem is "if isinstance(response, openai.Stream)", it's possible the response is not structured correctly and may not be seen as a stream.
Update: no, I just tested, it is type: <class 'openai.Stream'>...
I don't know what could be wrong so far, are you using the docker image? Which client version of the python-openai code are you using?
Thank u for the digging, the client works OK with openai's API and one of my vllm streaming server, so am afraid it was caused by some thread stuck on server.
I have encounter a similar situation where I inference with LLM in streaming, I resolve it by find I lack of a async
in my stream prediction function.
But after I test and verify the code by yours, it's looks like flawless.
The phenomon on my side currently is:
So still, I am doubt it might caused by fastapi or sse issue in asynchorinzation.
not the client, not the model generate.
Ugh, threading in python is a nightmare... You could try the newer AsyncClient from openai, with httpx? I don't know much about this but it seems preferred.
See a sample here: https://github.com/openai/openai-python/blob/195c05a64d39c87b2dfdf1eca2d339597f1fce03/examples/streaming.py#L33
Did you confirm the code is receiving data as a stream and just printing all at once?
I would be interested to see the .to_json() dumps of the two different streams, so I can compare. I have built everything from reverse engineering, so I don't even test with openai API directly, just openai client. I have not ruled out a server side issue.
Did you confirm the code is receiving data as a stream and just printing all at once?
Let me detailed test again to answer this.
Any update or should I close this report?
Hello, please wait for my debug information. Currently, I am employing non-stream directly. However, the stream functionality should be tested ultimately.
Hi, any update? Otherwise I think this is getting closed.
Hi, have been busy these days. The root still not precisely investigated.
Hi, I'm coming back around to this again - any chance you've been able to debug it?
using openai client to test, the client side is actually receive whole output at last, the server side can print streaming though
any reason for this?