BUG xinference、glm4工具调用报错400

问题描述, 日志截图

xinference部署glm-4-9b，通过oneapi接入fastgpt，使用glm4的对话功能正常，使用glm4的工具调用时，报错400 关联issue:https://github.com/labring/FastGPT/issues/1823

版本信息：

xinference:0.12.2 fastgpt:4.8.4-fix oneapi:0.6.6 glm4:glm-4-9b-chat

使用glm4的对话功能正常

使用glm4的工具调用时，报错400

config.json

 {
      "model": "glm-4-9b",
      "name": "glm-4-9b",
      "maxContext": 8192,
      "avatar": "/imgs/model/chatglm.svg",
      "maxResponse": 3000,
      "quoteMaxToken": 6000,
      "maxTemperature": 1.2,
      "charsPointsPrice": 0,
      "censor": false,
      "vision": false,
      "datasetProcess": false,
      "usedInClassify": true,
      "usedInExtractFields": true,
      "usedInToolCall": true,
      "usedInQueryExtension": true,
      "toolChoice": true,
      "functionCall": true,
      "customCQPrompt": "",
      "customExtractPrompt": "",
      "defaultSystemChatPrompt": "",
      "defaultConfig": {}
    },

oneapi报错日志

[SYS] 2024/06/22 - 17:21:29 | model ratio not found: glm-4-9b
[INFO] 2024/06/22 - 17:21:29 | 2024062217212958147780023082485 | user 1 has enough quota 999222410797, trusted and no need to pre-consume
[ERR] 2024/06/22 - 17:21:29 | 2024062217212958147780023082485 | relay error happen, status code is 400, won't retry in this case
[ERR] 2024/06/22 - 17:21:29 | 2024062217212958147780023082485 | relay error (channel #13): bad response status code 400
[GIN] 2024/06/22 - 17:21:29 | 2024062217212958147780023082485 | 400 |     10.0611ms |     10.4.134.11 |    POST /v1/chat/completions

fastgpt报错日志

{
  message: '400 bad response status code 400 (request id: 2024062217212958147780023082485)',
  stack: 'Error: 400 bad response status code 400 (request id: 2024062217212958147780023082485)\n' +
    '    at eL.generate (/app/projects/app/.next/server/chunks/76750.js:15:67594)\n' +
    '    at av.makeStatusError (/app/projects/app/.next/server/chunks/76750.js:15:79337)\n' +
    '    at av.makeRequest (/app/projects/app/.next/server/chunks/76750.js:15:80260)\n' +
    '    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n' +
    '    at async w (/app/projects/app/.next/server/chunks/75612.js:309:2105)\n' +
    '    at async Object.w [as tools] (/app/projects/app/.next/server/chunks/75612.js:305:4790)\n' +
    '    at async k (/app/projects/app/.next/server/chunks/75612.js:313:2241)\n' +
    '    at async Promise.all (index 0)\n' +
    '    at async E (/app/projects/app/.next/server/chunks/75612.js:313:2782)\n' +
    '    at async h (/app/projects/app/.next/server/pages/api/core/chat/chatTest.js:1:3266)'
}

xinference报错

2024-06-22 17:25:16,414 xinference.core.supervisor 43237 DEBUG    Enter get_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f151234e2a0>, 'glm-4-9b'), kwargs: {}
2024-06-22 17:25:16,415 xinference.core.worker 43237 DEBUG    Enter get_model, args: (<xinference.core.worker.WorkerActor object at 0x7f15123c3e20>,), kwargs: {'model_uid': 'glm-4-9b-1-0'}
2024-06-22 17:25:16,415 xinference.core.worker 43237 DEBUG    Leave get_model, elapsed time: 0 s
2024-06-22 17:25:16,415 xinference.core.supervisor 43237 DEBUG    Leave get_model, elapsed time: 0 s
2024-06-22 17:25:16,416 xinference.core.supervisor 43237 DEBUG    Enter describe_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f151234e2a0>, 'glm-4-9b'), kwargs: {}
2024-06-22 17:25:16,416 xinference.core.worker 43237 DEBUG    Enter describe_model, args: (<xinference.core.worker.WorkerActor object at 0x7f15123c3e20>,), kwargs: {'model_uid': 'glm-4-9b-1-0'}
2024-06-22 17:25:16,416 xinference.core.worker 43237 DEBUG    Leave describe_model, elapsed time: 0 s
2024-06-22 17:25:16,416 xinference.core.supervisor 43237 DEBUG    Leave describe_model, elapsed time: 0 s

xorbitsai / inference

BUG xinference、glm4工具调用报错400 #1699