xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
4.55k stars 357 forks source link

BUG glm4-chat的API请求,不支持流式回复吗 #1766

Open 13428116504 opened 1 month ago

13428116504 commented 1 month ago

glm4-chat的API请求,不支持流式回复吗,用着是一次性回复,能否支持流式,十分感谢

qinxuye commented 1 month ago

哪个引擎?应该是支持流式的。

13428116504 commented 1 month ago

哪个引擎?应该是支持流式的。

Transformers

liaotingyao commented 1 month ago

我用的是dify,直接报错,不支持流式,也不知道怎么禁用流式。。。

13428116504 commented 1 month ago

我用的是dify,直接报错,不支持流式,也不知道怎么禁用流式。。。 我也是dify,倒没有报错,0.12.3版本,只是回答的时候是一次性回复的

liaotingyao commented 1 month ago

我也是0.12.3版本,dify版本是0.6.12-fix1。dify前端报错:An error occurred during streaming。 然后看到别人是可以的: https://github.com/xorbitsai/inference/pull/1425

13428116504 commented 1 month ago

我也是0.12.3版本,dify版本是0.6.12-fix1。dify前端报错:An error occurred during streaming。 然后看到别人是可以的: #1425

我的dify是0.6.111

13428116504 commented 1 month ago

glm4-chat在xinference是不支持流式的,看着https请求是一次性回复,不是流式的

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open for 7 days with no activity.