li-plus / chatglm.cpp

C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
MIT License
2.92k stars 334 forks source link

openai server出现空回答 #319

Closed FuturePrayer closed 3 months ago

FuturePrayer commented 3 months ago

配合open webui使用,open webui会使用特定提示词让大模型为当前对话取名,但glm4-9b(INT8量化)返回了空字符串。 请求报文:

{
  "messages": [
    {
      "content": "Here is the query:\n你是谁\n\nCreate a concise, 3-5 word phrase with an emoji as a title for the previous query. Suitable Emojis for the summary can be used to enhance understanding but avoid quotation marks or special formatting. RESPOND ONLY WITH THE TITLE TEXT.\n\nExamples of titles:\n📉 Stock Market Trends\n🍪 Perfect Chocolate Chip Recipe\nEvolution of Music Streaming\nRemote Work Productivity Tips\nArtificial Intelligence in Healthcare\n🎮 Video Game Development Insights",
      "role": "user"
    }
  ],
  "model": "glm-4-flash",
  "max_tokens": 50,
  "stream": false
}

响应报文:

{
    "id": "chatcmpl",
    "model": "default-model",
    "object": "chat.completion",
    "created": 1718763800,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "",
                "tool_calls": null
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 106,
        "completion_tokens": 2,
        "total_tokens": 108
    }
}

stream设置为true时依然会返回空值。用这个报文请求智谱官方的glm-4-air、glm-4-flash等模型均能正常返回结果。

FuturePrayer commented 3 months ago

目前看来加大max_tokens可以解决,但类似“🤖 Who Am I?"这种回答怎么也不会超过50tokens吧?

li-plus commented 3 months ago
    "usage": {
        "prompt_tokens": 106,
        "completion_tokens": 2,
        "total_tokens": 108
    }

返回的 usage 可以看到 token 数,max_tokens 是 total_tokens 的最大限制