xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5k stars 396 forks source link

当我运行自定义的Chatglm3-6B的时候,注册模型时候报错,可以帮忙看看是什么原因导致的吗 #1324

Closed cvbfdgtn closed 1 month ago

cvbfdgtn commented 5 months ago

model.json文件如下: { "version": 1, "context_length": 8192, "model_name": "custom-chatglm3-6B", "model_lang": [ "en", "zh" ], "model_ability": [ "chat", "tools" ], "model_family": "chatglm3", "model_specs": [ { "model_format": "pytorch", "model_size_in_billions": 6, "quantizations": [ "none" ], "model_id": "THUDM/chatglm3-6b", "model_uri": "/cfs/data/private/zhangsl/Model/GLM/chatglm3-6b/" } ], "prompt_style": { "style_name": "CHATGLM3", "system_prompt": "", "roles": [ "user", "assistant" ], "stop_token_ids": [ 64795, 64797, 2 ], "stop": [ "<|user|>", "<|observation|>" ] } }

运行命令如下: xinference register --model-type LLM --file /nfsdir/zhangsl/xinference/glm/ChatGLM3-6B-Chat/model.json --persist 报错信息如下: RuntimeError: Failed to register model, detail: [address=0.0.0.0:51937, pid=3366852] Model version info inconsistency between supervisor and worker

hainaweiben commented 5 months ago

https://github.com/xorbitsai/inference/blob/c534028e52f714e499e035116d39638a5e8936e3/xinference/model/llm/llm_family.json#L598 或许可以参考xinf自己对于chatglm的定义

cvbfdgtn commented 5 months ago

@hainaweiben 修改名称后报错RuntimeError: Failed to register model, detail: [address=0.0.0.0:65392, pid=3432797] Model name conflicts with existing model chatglm3,是因为系统模型已经注册了chatglm3,再注册出现冲突

ChengjieLi28 commented 5 months ago

@cvbfdgtn 完整服务端日志贴一下

cvbfdgtn commented 5 months ago

因为已经这个注册模型了,导致出现这个问题,可是现在又遇到新的问题。tools调用不了 `import openai import json

client = openai.OpenAI( base_url="http://127.0.0.1:9997/v1", api_key="12121212", ) messages = [ {"role": "system", "content": "你是一个有用的助手。不要对要函数调用的值做出假设。"}, {"role": "user", "content": "北京 现在的天气怎么样?"} ]

tools = [ { "type": "function", "function": { "name": "get_current_weather", "description": "获取当前天气", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "城市,例如北京", }, "format": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "使用的温度单位。从所在的城市进行推断。", }, }, "required": ["location", "format"], }, }, } ]

chat_completion = client.chat.completions.create( model="custom-chatglm3-6B", messages=messages, tools=tools, temperature=0.7 ) print(chat_completion) func_name = chat_completion.choices[0].message.tool_calls[0].function.name print('func_name', func_name) func_args = chat_completion.choices[0].message.tool_calls[0].function.arguments func_args_dict = json.loads(func_args) print('func_args', func_args_dict['location'])` 采用这个调用chatglm3的tools,可是没有调用起来,而且直接报错,报错信息如下: INFO 04-18 17:43:25 async_llm_engine.py:508] Received request 1a6fcd2a-fd68-11ee-863a-00163e411a25: prompt: '<|system|>\n 你是一个有用的助手。不要对要函数调用的值做出假设。\n<|user|>\n 北京 现在的天气怎么样?\n<|assistant|>', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['<|user|>', '<|observation|>'], stop_token_ids=[64795, 64797, 2], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt_token_ids: None, lora_request: None. INFO 04-18 17:43:26 metrics.py:218] Avg prompt throughput: 17.6 tokens/s, Avg generation throughput: 47.1 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0% INFO 04-18 17:43:31 metrics.py:218] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 97.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0% INFO 04-18 17:43:36 async_llm_engine.py:120] Finished request 1a6fcd2a-fd68-11ee-863a-00163e411a25. 2024-04-18 17:43:36,266 xinference.api.restful_api 3431975 ERROR [address=0.0.0.0:41207, pid=3480762] 0 Traceback (most recent call last): File "/nfsdir/zhangsl/env/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1413, in create_chat_completion data = await model.chat(prompt, system_prompt, chat_history, kwargs) File "/nfsdir/zhangsl/env/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/nfsdir/zhangsl/env/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/nfsdir/zhangsl/env/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/nfsdir/zhangsl/env/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/nfsdir/zhangsl/env/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive__ raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive result = await result File "/nfsdir/zhangsl/env/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped ret = await func(*args, kwargs) File "/nfsdir/zhangsl/env/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 79, in wrapped_func ret = await fn(self, *args, *kwargs) File "/nfsdir/zhangsl/env/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper r = await func(self, args, kwargs) File "/nfsdir/zhangsl/env/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 375, in chat response = await self._call_wrapper( File "/nfsdir/zhangsl/env/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 103, in _async_wrapper return await fn(*args, *kwargs) File "/nfsdir/zhangsl/env/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 325, in _call_wrapper ret = await fn(args, **kwargs) File "/nfsdir/zhangsl/env/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 484, in async_chat return self._tool_calls_completion( File "/nfsdir/zhangsl/env/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 647, in _tool_calls_completion content, func, args = cls._eval_tool_arguments(model_family, c, tools) File "/nfsdir/zhangsl/env/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 605, in _eval_tool_arguments content, func, args = cls._eval_chatglm3_arguments(c, tools) File "/nfsdir/zhangsl/env/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 553, in _eval_chatglm3_arguments if isinstance(c[0], str): KeyError: [address=0.0.0.0:41207, pid=3480762]

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 1 month ago

This issue was closed because it has been inactive for 5 days since being marked as stale.