Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
[SYS] 2024/06/22 - 17:21:29 | model ratio not found: glm-4-9b
[INFO] 2024/06/22 - 17:21:29 | 2024062217212958147780023082485 | user 1 has enough quota 999222410797, trusted and no need to pre-consume
[ERR] 2024/06/22 - 17:21:29 | 2024062217212958147780023082485 | relay error happen, status code is 400, won't retry in this case
[ERR] 2024/06/22 - 17:21:29 | 2024062217212958147780023082485 | relay error (channel #13): bad response status code 400
[GIN] 2024/06/22 - 17:21:29 | 2024062217212958147780023082485 | 400 | 10.0611ms | 10.4.134.11 | POST /v1/chat/completions
fastgpt报错日志
{
message: '400 bad response status code 400 (request id: 2024062217212958147780023082485)',
stack: 'Error: 400 bad response status code 400 (request id: 2024062217212958147780023082485)\n' +
' at eL.generate (/app/projects/app/.next/server/chunks/76750.js:15:67594)\n' +
' at av.makeStatusError (/app/projects/app/.next/server/chunks/76750.js:15:79337)\n' +
' at av.makeRequest (/app/projects/app/.next/server/chunks/76750.js:15:80260)\n' +
' at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n' +
' at async w (/app/projects/app/.next/server/chunks/75612.js:309:2105)\n' +
' at async Object.w [as tools] (/app/projects/app/.next/server/chunks/75612.js:305:4790)\n' +
' at async k (/app/projects/app/.next/server/chunks/75612.js:313:2241)\n' +
' at async Promise.all (index 0)\n' +
' at async E (/app/projects/app/.next/server/chunks/75612.js:313:2782)\n' +
' at async h (/app/projects/app/.next/server/pages/api/core/chat/chatTest.js:1:3266)'
}
xinference报错
2024-06-22 17:25:16,414 xinference.core.supervisor 43237 DEBUG Enter get_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f151234e2a0>, 'glm-4-9b'), kwargs: {}
2024-06-22 17:25:16,415 xinference.core.worker 43237 DEBUG Enter get_model, args: (<xinference.core.worker.WorkerActor object at 0x7f15123c3e20>,), kwargs: {'model_uid': 'glm-4-9b-1-0'}
2024-06-22 17:25:16,415 xinference.core.worker 43237 DEBUG Leave get_model, elapsed time: 0 s
2024-06-22 17:25:16,415 xinference.core.supervisor 43237 DEBUG Leave get_model, elapsed time: 0 s
2024-06-22 17:25:16,416 xinference.core.supervisor 43237 DEBUG Enter describe_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f151234e2a0>, 'glm-4-9b'), kwargs: {}
2024-06-22 17:25:16,416 xinference.core.worker 43237 DEBUG Enter describe_model, args: (<xinference.core.worker.WorkerActor object at 0x7f15123c3e20>,), kwargs: {'model_uid': 'glm-4-9b-1-0'}
2024-06-22 17:25:16,416 xinference.core.worker 43237 DEBUG Leave describe_model, elapsed time: 0 s
2024-06-22 17:25:16,416 xinference.core.supervisor 43237 DEBUG Leave describe_model, elapsed time: 0 s
问题描述, 日志截图
xinference部署glm-4-9b,通过oneapi接入fastgpt,使用glm4的对话功能正常,使用glm4的工具调用时,报错400 关联issue:https://github.com/labring/FastGPT/issues/1823
版本信息:
xinference:0.12.2 fastgpt:4.8.4-fix oneapi:0.6.6 glm4:glm-4-9b-chat
使用glm4的对话功能正常
使用glm4的工具调用时,报错400
config.json
oneapi报错日志
fastgpt报错日志
xinference报错