geosmart commented 9 months ago

Self Checks

[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to file this report (我已阅读并同意 Language Policy).

Dify version

0.4.9

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

I add a openai-compatible model(rwkv docker service), the api is working fine; but in dify, the answer is not show.

I test blocking, the answer is show.

curl --location --request POST 'http://my-domain/v1/chat-messages' \
--header 'Authorization: Bearer app-xxx' \
--header 'Content-Type: application/json' \
--data-raw '{
    "inputs": {},
    "query": "who are you",
    "response_mode": "blocking",
    "conversation_id": "",
    "user": "dify-api"
}'
{"event": "message", "task_id": "d0ecb553-fa53-46e7-a76a-83f962795fa0", "id": "79d7e38e-9877-46bd-aad8-b2aa039ceb04", "message_id": "79d7e38e-9877-46bd-aad8-b2aa039ceb04", "mode": "chat", "answer": "\u6211\u662f\u4e00\u540d\u5168\u6808\u5f00\u53d1\u4eba\u5458\uff0c\u64c5\u957fpython,kotlin,markdown,\u719f\u6089flink,spark\u8ba1\u7b97\u7cfb\u7edf\uff0c\u5bf9mysql,elasticsearch\u7b49\u5b58\u50a8\u7cfb\u7edf\u6709\u5b9e\u6218\u7ecf\u9a8c\u3002\n", "metadata": {"usage": {"prompt_tokens": 72, "prompt_unit_price": "0", "prompt_price_unit": "0", "prompt_price": "0.0000000", "completion_tokens": 48, "completion_unit_price": "0", "completion_price_unit": "0", "completion_price": "0.0000000", "total_tokens": 120, "total_price": "0.0000000", "currency": "USD", "latency": 2.3984292540117167}}, "created_at": 1706019858, "conversation_id": "997fb182-88d7-46c9-a4d7-672437a6fd5a"}

blocking

I test the streaming, the answer is blank

curl --location --request POST 'http://my-domain/v1/chat-messages' \
--header 'Authorization: Bearer app-xxx' \
--header 'Content-Type: application/json' \
--data-raw '{
    "inputs": {},
    "query": "who are you",
    "response_mode": "streaming",
    "conversation_id": "",
    "user": "dify-api"
}'
event: ping

data: {"event": "message", "id": "f3f16af0-a327-43e8-9071-d143575f57d1", "task_id": "3e070c26-8d89-446e-bd9e-45200136eaa0", "message_id": "f3f16af0-a327-43e8-9071-d143575f57d1", "answer": "", "created_at": 1706019951, "conversation_id": "35e7467a-dbef-47c4-8336-d4330bc2b82a"}

data: {"event": "message_end", "task_id": "3e070c26-8d89-446e-bd9e-45200136eaa0", "id": "f3f16af0-a327-43e8-9071-d143575f57d1", "message_id": "f3f16af0-a327-43e8-9071-d143575f57d1", "conversation_id": "35e7467a-dbef-47c4-8336-d4330bc2b82a", "metadata": {"usage": {"prompt_tokens": 106, "prompt_unit_price": "0", "prompt_price_unit": "0", "prompt_price": "0.0000000", "completion_tokens": 0, "completion_unit_price": "0", "completion_price_unit": "0", "completion_price": "0.0000000", "total_tokens": 106, "total_price": "0.0000000", "currency": "USD", "latency": 14.101711608003825}}}

blank-streaming

✔️ Expected Behavior

the streaming mode should working fine to show the answer

❌ Actual Behavior

with streaming mode the answer is blank. also the preview page the answer is blank too

dosubot[bot] commented 9 months ago

🤖

Thank you for providing detailed information about the issue you're experiencing. It seems like the problem might be related to how the streaming mode is implemented in the dify codebase.

In the _handle_chat_generate_stream_response function, it generates LLMResultChunk objects for each message in the response. If the usage attribute of the message is not present, it still yields an LLMResultChunk object, but without setting the usage attribute. This could potentially result in an empty or blank response if the usage attribute is not present in the message.

Here is the relevant code:

def _handle_chat_generate_stream_response(self, model: str, prompt_messages: list[PromptMessage], 
                                              credentials: dict, response: Generator[MinimaxMessage, None, None]) \
        -> Generator[LLMResultChunk, None, None]:
        for message in response:
            if message.usage:
                usage = self._calc_response_usage(
                    model=model, credentials=credentials, 
                    prompt_tokens=message.usage['prompt_tokens'], 
                    completion_tokens=message.usage['completion_tokens']
                )
                yield LLMResultChunk(
                    model=model,
                    prompt_messages=prompt_messages,
                    delta=LLMResultChunkDelta(
                        index=0,
                        message=AssistantPromptMessage(
                            content=message.content,
                            tool_calls=[]
                        ),
                        usage=usage,
                        finish_reason=message.stop_reason if message.stop_reason else None,
                    ),
                )
            else:
                yield LLMResultChunk(
                    model=model,
                    prompt_messages=prompt_messages,
                    delta=LLMResultChunkDelta(
                        index=0,
                        message=AssistantPromptMessage(
                            content=message.content,
                            tool_calls=[]
                        ),
                        finish_reason=message.stop_reason if message.stop_reason else None,
                    ),
                )

You can find this code in the api/core/model_runtime/model_providers/minimax/llm/llm.py file.

If the usage attribute is not present in the message, the LLMResultChunk object is still yielded, but without setting the usage attribute. This could potentially result in an empty or blank response.

To resolve this issue, you might need to ensure that the usage attribute is always present in the message. If it's not possible to ensure this, you might need to modify the _handle_chat_generate_stream_response function to handle cases where the usage attribute is not present.

I hope this helps! If you have any other questions or need further clarification, please let me know.

Sources

api/core/model_runtime/model_providers/minimax/llm/llm.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

geosmart commented 9 months ago

I use openai_api_compatible model，so the source code is openai_api_compatible/llm/llm.py

openai_api_compatible model test

blocking mode

i test my openai_api_compatible mode api, the blocking mode response has usage attribute.

curl --location --request POST 'http://my-domain/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data-raw '{
    "messages": [
        {
            "role": "system",
            "content": "xxx"
        },
        {
            "role": "user",
            "content": "who are you"
        }
    ],
    "temperature": 1.0,
    "top_p": 0.5,
    "max_tokens": 500,
    "stream": false
}'
# response
{
    "object": "chat.completion",
    "model": "rwkv-5-world-1b5",
    "usage": {
        "prompt_tokens": 0,
        "completion_tokens": 500,
        "total_tokens": 500
    },
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "xxx
            },
            "index": 0,
            "finish_reason": "stop"
        }
    ]
}

streamimg mode

the sse response has no usage attribute. i think it's ok , openai.apifox.cn the result is in the choices attr.

curl --location --request POST 'http://my-domain/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data-raw '{
    "messages": [
        {
            "role": "system",
            "content": "xxx"
        },
        {
            "role": "user",
            "content": "who are you"
        }
    ],
    "temperature": 1.0,
    "top_p": 0.5,
    "max_tokens": 500,
    "stream": true
}'
# response
data: {"object": "chat.completion.chunk", "model": "rwkv-5-world-1b5", "choices": [{"delta": {"content": " "}, "index": 0, "finish_reason": null}]}

data: {"object": "chat.completion.chunk", "model": "rwkv-5-world-1b5", "choices": [{"delta": {"content": "\u6211"}, "index": 0, "finish_reason": null}]}

data: {"object": "chat.completion.chunk", "model": "rwkv-5-world-1b5", "choices": [{"delta": {"content": "\u53eb"}, "index": 0, "finish_reason": null}]}

still i don't konw what's wrong

geosmart commented 9 months ago

I run the testcase of

https://github.com/langgenius/dify/blob/00f4e6ec449afd939cedf508a9349581e3f6ef8d/api/tests/integration_tests/model_runtime/openai_api_compatible/test_llm.py#L68

test_llm.py::test_invoke_stream_model FAILED                             [100%]
test_llm.py:67 (test_invoke_stream_model)

error at parse decoded_chunk

        for chunk in response.iter_lines(decode_unicode=True, delimiter='\n\n'):
            if chunk:
                decoded_chunk = chunk.strip().lstrip('data: ').lstrip()
                chunk_json = None
                try:
                    chunk_json = json.loads(decoded_chunk)
                # stream ended
                except json.JSONDecodeError as e:
                   # error to parse decoded_chunk
                    print("parse chunk_json error:" + e)
                    yield create_final_llm_result_chunk(
                        index=chunk_index + 1,
                        message=AssistantPromptMessage(content=""),
                        finish_reason="Non-JSON encountered."
                    )
                    break

decoded_chunk

{"object": "chat.completion.chunk", "model": "RWKV-5-World-1B5-v2-20231025-ctx4096", "choices": [{"delta": {"content": " I"}, "index": 0, "finish_reason": null}]}

data: {"object": "chat.completion.chunk", "model": "RWKV-5-World-1B5-v2-20231025-ctx4096", "choices": [{"delta": {"content": " am"}, "index": 0, "finish_reason": null}]}

data: [DONE]

@bowenliang123

bowenliang123 commented 9 months ago

The preset delimiter in openai-compatible provider is \n\n may not satisfy all the upstream LLMs of different types.

In my case, I would change the delimiter for qwen(千问) LLMs to \n\r,

if "qwen" in model.lower():
     delimiter = '\n\r'
else:
    delimiter = '\n\n'

geosmart commented 9 months ago

thanks very much. @bowenliang123

I change the souce code of openai_api_compatible/llm/llm.py to temporary integrate my rwkv model.

➜  dify docker exec -it dify-api cat core/model_runtime/model_providers/openai_api_compatible/llm/llm.py | grep 'rwkv' -C3
                )
            )

        if "rwkv" in model.lower():
            delimiter = '\r\n'
        else:
            delimiter = '\n\n'
--
                      chunk_json = json.loads(decoded_chunk)
                # stream ended
                except json.JSONDecodeError as e:
                    if "rwkv" in model.lower() and decoded_chunk == '[DONE]':
                        break
                    yield create_final_llm_result_chunk(
                        index=chunk_index + 1,

so maybe I need to impl a model_providers of rwkv in production

geosmart commented 9 months ago

In rwkv model api compatible case, openai_api_compatible support to config chunk delimiter and end stop chunk in UI will be helpful

guchenhe commented 9 months ago

Hi @geosmart, seems like you've found a way to accommodate for it. To better support this on our end, my idea for an enhancement addressing these cases would be to add an optional delimiter field when configuring model credentials for OpenAI API-compatible models. When this field is filled by the user, we would use it to separate streaming chunks instead of \n\n as per OpenAI's API. I expect this to be a quick win and shouldn't cost a lot of time, would you like to make a go with a PR for it? :)

geosmart commented 9 months ago

ok, I will commit a pr for this

geosmart commented 9 months ago

openai_api_compatible model config window 1111

jsboige commented 1 month ago

openai_api_compatible model config window

Thanks for this. It was the fix to get Oobabooga's text-generation-webui to work as a dify connector. Only found this issue after I was painfully able to debug the container, so I suppose it would need more visibility.

langgenius / dify

the openai-compatible model's answer is blank with streaming mode #2144

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

Sources

openai_api_compatible model test

blocking mode

streamimg mode