BUG glm4-chat的API请求，不支持流式回复吗 - Githubissues

xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

https://inference.readthedocs.io

Apache License 2.0

4.55k stars 357 forks source link

BUG glm4-chat的API请求，不支持流式回复吗 #1766

Open 13428116504 opened 1 month ago

13428116504 commented 1 month ago

glm4-chat的API请求，不支持流式回复吗，用着是一次性回复，能否支持流式，十分感谢

qinxuye commented 1 month ago

哪个引擎？应该是支持流式的。

13428116504 commented 1 month ago

哪个引擎？应该是支持流式的。

Transformers

liaotingyao commented 1 month ago

我用的是dify，直接报错，不支持流式，也不知道怎么禁用流式。。。

13428116504 commented 1 month ago

我用的是dify，直接报错，不支持流式，也不知道怎么禁用流式。。。我也是dify，倒没有报错，0.12.3版本，只是回答的时候是一次性回复的

liaotingyao commented 1 month ago

我也是0.12.3版本，dify版本是0.6.12-fix1。dify前端报错：An error occurred during streaming。然后看到别人是可以的： https://github.com/xorbitsai/inference/pull/1425

13428116504 commented 1 month ago

我也是0.12.3版本，dify版本是0.6.12-fix1。dify前端报错：An error occurred during streaming。然后看到别人是可以的： #1425

我的dify是0.6.111

13428116504 commented 1 month ago

glm4-chat在xinference是不支持流式的，看着https请求是一次性回复，不是流式的

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open for 7 days with no activity.