Closed geosmart closed 9 months ago
🤖
Thank you for providing detailed information about the issue you're experiencing. It seems like the problem might be related to how the streaming mode is implemented in the dify codebase.
In the _handle_chat_generate_stream_response
function, it generates LLMResultChunk
objects for each message in the response. If the usage
attribute of the message is not present, it still yields an LLMResultChunk
object, but without setting the usage
attribute. This could potentially result in an empty or blank response if the usage
attribute is not present in the message.
Here is the relevant code:
def _handle_chat_generate_stream_response(self, model: str, prompt_messages: list[PromptMessage],
credentials: dict, response: Generator[MinimaxMessage, None, None]) \
-> Generator[LLMResultChunk, None, None]:
for message in response:
if message.usage:
usage = self._calc_response_usage(
model=model, credentials=credentials,
prompt_tokens=message.usage['prompt_tokens'],
completion_tokens=message.usage['completion_tokens']
)
yield LLMResultChunk(
model=model,
prompt_messages=prompt_messages,
delta=LLMResultChunkDelta(
index=0,
message=AssistantPromptMessage(
content=message.content,
tool_calls=[]
),
usage=usage,
finish_reason=message.stop_reason if message.stop_reason else None,
),
)
else:
yield LLMResultChunk(
model=model,
prompt_messages=prompt_messages,
delta=LLMResultChunkDelta(
index=0,
message=AssistantPromptMessage(
content=message.content,
tool_calls=[]
),
finish_reason=message.stop_reason if message.stop_reason else None,
),
)
You can find this code in the api/core/model_runtime/model_providers/minimax/llm/llm.py file.
If the usage
attribute is not present in the message, the LLMResultChunk
object is still yielded, but without setting the usage
attribute. This could potentially result in an empty or blank response.
To resolve this issue, you might need to ensure that the usage
attribute is always present in the message. If it's not possible to ensure this, you might need to modify the _handle_chat_generate_stream_response
function to handle cases where the usage
attribute is not present.
I hope this helps! If you have any other questions or need further clarification, please let me know.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
I use openai_api_compatible model,so the source code is openai_api_compatible/llm/llm.py
i test my openai_api_compatible mode api, the blocking mode response has usage attribute.
curl --location --request POST 'http://my-domain/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data-raw '{
"messages": [
{
"role": "system",
"content": "xxx"
},
{
"role": "user",
"content": "who are you"
}
],
"temperature": 1.0,
"top_p": 0.5,
"max_tokens": 500,
"stream": false
}'
# response
{
"object": "chat.completion",
"model": "rwkv-5-world-1b5",
"usage": {
"prompt_tokens": 0,
"completion_tokens": 500,
"total_tokens": 500
},
"choices": [
{
"message": {
"role": "assistant",
"content": "xxx
},
"index": 0,
"finish_reason": "stop"
}
]
}
the sse response has no usage attribute. i think it's ok , openai.apifox.cn the result is in the choices attr.
curl --location --request POST 'http://my-domain/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data-raw '{
"messages": [
{
"role": "system",
"content": "xxx"
},
{
"role": "user",
"content": "who are you"
}
],
"temperature": 1.0,
"top_p": 0.5,
"max_tokens": 500,
"stream": true
}'
# response
data: {"object": "chat.completion.chunk", "model": "rwkv-5-world-1b5", "choices": [{"delta": {"content": " "}, "index": 0, "finish_reason": null}]}
data: {"object": "chat.completion.chunk", "model": "rwkv-5-world-1b5", "choices": [{"delta": {"content": "\u6211"}, "index": 0, "finish_reason": null}]}
data: {"object": "chat.completion.chunk", "model": "rwkv-5-world-1b5", "choices": [{"delta": {"content": "\u53eb"}, "index": 0, "finish_reason": null}]}
still i don't konw what's wrong
I run the testcase of
test_llm.py::test_invoke_stream_model FAILED [100%]
test_llm.py:67 (test_invoke_stream_model)
error at parse decoded_chunk
for chunk in response.iter_lines(decode_unicode=True, delimiter='\n\n'):
if chunk:
decoded_chunk = chunk.strip().lstrip('data: ').lstrip()
chunk_json = None
try:
chunk_json = json.loads(decoded_chunk)
# stream ended
except json.JSONDecodeError as e:
# error to parse decoded_chunk
print("parse chunk_json error:" + e)
yield create_final_llm_result_chunk(
index=chunk_index + 1,
message=AssistantPromptMessage(content=""),
finish_reason="Non-JSON encountered."
)
break
decoded_chunk
{"object": "chat.completion.chunk", "model": "RWKV-5-World-1B5-v2-20231025-ctx4096", "choices": [{"delta": {"content": " I"}, "index": 0, "finish_reason": null}]}
data: {"object": "chat.completion.chunk", "model": "RWKV-5-World-1B5-v2-20231025-ctx4096", "choices": [{"delta": {"content": " am"}, "index": 0, "finish_reason": null}]}
data: [DONE]
@bowenliang123
The preset delimiter in openai-compatible provider is \n\n
may not satisfy all the upstream LLMs of different types.
In my case, I would change the delimiter for qwen
(千问) LLMs to \n\r
,
if "qwen" in model.lower():
delimiter = '\n\r'
else:
delimiter = '\n\n'
thanks very much. @bowenliang123
I change the souce code of openai_api_compatible/llm/llm.py
to temporary integrate my rwkv model.
➜ dify docker exec -it dify-api cat core/model_runtime/model_providers/openai_api_compatible/llm/llm.py | grep 'rwkv' -C3
)
)
if "rwkv" in model.lower():
delimiter = '\r\n'
else:
delimiter = '\n\n'
--
chunk_json = json.loads(decoded_chunk)
# stream ended
except json.JSONDecodeError as e:
if "rwkv" in model.lower() and decoded_chunk == '[DONE]':
break
yield create_final_llm_result_chunk(
index=chunk_index + 1,
so maybe I need to impl a model_providers of rwkv in production
In rwkv model api compatible case,
openai_api_compatible support to config chunk delimiter
and end stop chunk
in UI will be helpful
Hi @geosmart, seems like you've found a way to accommodate for it. To better support this on our end, my idea for an enhancement addressing these cases would be to add an optional delimiter
field when configuring model credentials for OpenAI API-compatible models. When this field is filled by the user, we would use it to separate streaming chunks instead of \n\n
as per OpenAI's API. I expect this to be a quick win and shouldn't cost a lot of time, would you like to make a go with a PR for it? :)
ok, I will commit a pr for this
openai_api_compatible model config window
openai_api_compatible model config window
Thanks for this. It was the fix to get Oobabooga's text-generation-webui to work as a dify connector. Only found this issue after I was painfully able to debug the container, so I suppose it would need more visibility.
Self Checks
Dify version
0.4.9
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
I add a openai-compatible model(rwkv docker service), the api is working fine; but in dify, the answer is not show.
I test blocking, the answer is show.
I test the streaming, the answer is blank
✔️ Expected Behavior
the streaming mode should working fine to show the answer
❌ Actual Behavior
with streaming mode the answer is blank. also the preview page the answer is blank too