Open Dravenlll opened 1 month ago
目前 0.13.3 对于 GLM4-chat 的非流式支持有问题,但对流式应该能正常支持。请确认服务端版本。
同问,求解
这个问题出在你下载的glm4-9b模型的~/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat/modeling_chatglm.py文件里没有stream_chat函数,我从modelscope中找了个带stream_chat函数的文件拷进去就work了
@torch.inference_mode()
def stream_chat(self, tokenizer, query: str, history: List[Dict] = None, role: str = "user",
past_key_values=None, max_length: int = 8192, do_sample=True, top_p=0.8, temperature=0.8,
logits_processor=None, return_past_key_values=False, **kwargs):
if history is None:
history = []
if logits_processor is None:
logits_processor = LogitsProcessorList()
logits_processor.append(InvalidScoreLogitsProcessor())
eos_token_id = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|user|>"),
tokenizer.convert_tokens_to_ids("<|observation|>")]
gen_kwargs = {"max_length": max_length, "do_sample": do_sample, "top_p": top_p,
"temperature": temperature, "logits_processor": logits_processor, **kwargs}
if past_key_values is None:
inputs = tokenizer.apply_chat_template(history + [{"role": role, "content": query}],
add_generation_prompt=True, tokenize=True, return_tensors="pt",
return_dict=True)
else:
inputs = tokenizer.apply_chat_template([{"role": role, "content": query}], add_special_tokens=False,
add_generation_prompt=True, tokenize=True, return_tensors="pt",
return_dict=True)
inputs = inputs.to(self.device)
if past_key_values is not None:
past_length = past_key_values[0][0].shape[2]
inputs.position_ids += past_length
attention_mask = inputs.attention_mask
attention_mask = torch.cat((attention_mask.new_ones(1, past_length), attention_mask), dim=1)
inputs['attention_mask'] = attention_mask
history.append({"role": role, "content": query})
for outputs in self.stream_generate(**inputs, past_key_values=past_key_values,
eos_token_id=eos_token_id, return_past_key_values=return_past_key_values,
**gen_kwargs):
if return_past_key_values:
outputs, past_key_values = outputs
outputs = outputs.tolist()[0][len(inputs["input_ids"][0]):-1]
response = tokenizer.decode(outputs)
if response and response[-1] != "�":
response, new_history = self.process_response(response, history)
if return_past_key_values:
yield response, new_history, past_key_values
else:
yield response, new_history
stream_chat函数拷贝进去后,出现error:name 'LogitsProcessorList' is not defined
在最新的 0.14.0 版本仍然有这个问题。一推理就提示error during streaming
0.14.0 的错误信息再贴下。
2024-08-03 18:02:53,419 xinference.api.restful_api 1 ERROR Chat completion stream got an error: [address=0.0.0.0:39701, pid=1105] 'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat' Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 1671, in stream_results iterator = await model.chat( File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 656, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 367, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive__ raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 45, in wrapped ret = await func(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 90, in wrapped_func ret = await fn(self, *args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 462, in _wrapper r = await func(self, args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 523, in chat response = await self._call_wrapper_json( File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 393, in _call_wrapper_json return await self._call_wrapper("json", fn, *args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 114, in _async_wrapper return await fn(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 404, in _call_wrapper ret = await asyncio.to_thread(fn, args, kwargs) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/pytorch/chatglm.py", line 481, in chat stream_chat = self._model.stream_chat File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1709, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: [address=0.0.0.0:39701, pid=1105] 'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'
更新最新的模型文件试下。
更新最新的模型文件在xinference中怎么试?还是说用模型文件中的demo.py来启动?
更新最新的模型文件试下。
最新的模型文件是10几天前提交的,还是存在这个问题。模型modeling_chatglm.py里没有stream_chat函数
同问,求解
transformers==4.41.2 将glm-4-9b-chat-1m模型的modeling_chatglm.py拷贝替换glm-4-9b-chat的modeling_chatglm.py。 如果出现ValueError: too many values to unpack (expected 2),可以参考https://huggingface.co/THUDM/glm-4-9b-chat/discussions/58
transformers 可以升级到最新版了。我们适配了最新的模型版本和最新的 transformers。
transformers 可以升级到最新版了。我们适配了最新的模型版本和最新的 transformers。
将xinference升级到 0.14.0.post1后,仍然报错: File "/home/aiuser/.conda/envs/xinference/lib/python3.11/site-packages/xinference/model/llm/pytorch/chatglm.py", line 481, in chat stream_chat = self._model.stream_chat ^^^^^^^^^^^^^^^^^ File "/home/aiuser/.conda/envs/xinference/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1709, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") ^^^^^^^^^^^^^^^^^ AttributeError: [address=0.0.0.0:33882, pid=34764] 'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'
xinference版本:0.14.0.post1 transformers版本:4.42.4
猜测你和我一样,是微调模型无法使用。折腾了好几天,找到了个曲线救国的方案,直接加路径里就行:
尝试过这些方案,以防有后人重复尝试:
猜测你和我一样,是微调模型无法使用。折腾了好几天,找到了个曲线救国的方案,直接加路径里就行:
尝试过这些方案,以防有后人重复尝试:
- 'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat' THUDM/GLM-4#365 (comment)
- xinference下载的glm4就能正常使用。遂想看一下下载的chatglm_model有没有不一样的地方,结果一样。
请问您的modeling_chatglm.py代码是用的最初的还是您自行逐步补充过的?我这边加了路径,还是报GenerationConfig' object has no attribute '_eos_token_tensor’
@lergliu 最初的,没动过
@lergliu 最初的,没动过
刚刚试了最初的,我的还是报'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'
@lergliu 最初的,没动过
刚刚试了最初的,我的还是报'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'
哦sry,刚看了,我用的是1m的那个modeling_chatglm.py
@lergliu 最初的,没动过
刚刚试了最初的,我的还是报'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'
哦sry,刚看了,我用的是1m的那个modeling_chatglm.py
我的还是不行,还是报'GenerationConfig' object has no attribute '_eos_token_tensor'
@lergliu 最初的,没动过
刚刚试了最初的,我的还是报'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'
哦sry,刚看了,我用的是1m的那个modeling_chatglm.py
能把您的代码发给我看一下吗?我的邮箱lergiu@126.com,方便的话能否发到邮箱里。
@lergliu 最初的,没动过
刚刚试了最初的,我的还是报'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'
哦sry,刚看了,我用的是1m的那个modeling_chatglm.py
我的还是不行,还是报'GenerationConfig' object has no attribute '_eos_token_tensor'
同样的问题,推理会报错'GenerationConfig' object has no attribute '_eos_token_tensor',glm3-6b,请问你解决了吗
@lergliu 最初的,没动过
刚刚试了最初的,我的还是报'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'
哦sry,刚看了,我用的是1m的那个modeling_chatglm.py
我的还是不行,还是报'GenerationConfig' object has no attribute '_eos_token_tensor'
同样的问题,推理会报错'GenerationConfig' object has no attribute '_eos_token_tensor',glm3-6b,请问你解决了吗
没有呢
@lergliu 最初的,没动过
刚刚试了最初的,我的还是报'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'
哦sry,刚看了,我用的是1m的那个modeling_chatglm.py
我的还是不行,还是报'GenerationConfig' object has no attribute '_eos_token_tensor'
同样的问题,推理会报错'GenerationConfig' object has no attribute '_eos_token_tensor',glm3-6b,请问你解决了吗
没有呢
transformers 降级到 4.42 可以解决,但是会有其他问题,我目前降级到 4.41 了。参考这里
transformers 可以升级到最新版了。我们适配了最新的模型版本和最新的 transformers。
你们适配的transformers版本号是多少?glm4-chat部署测试成功过没有呢?我们安装xinference时都是自动安装的transformers版本啊
@lergliu 最初的,没动过
刚刚试了最初的,我的还是报'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'
哦sry,刚看了,我用的是1m的那个modeling_chatglm.py
我的还是不行,还是报'GenerationConfig' object has no attribute '_eos_token_tensor'
同样的问题,推理会报错'GenerationConfig' object has no attribute '_eos_token_tensor',glm3-6b,请问你解决了吗
没有呢
transformers 降级到 4.42 可以解决,但是会有其他问题,我目前降级到 4.41 了。参考这里
我这边4.42和4.41都有问题
同问,求解
transformers==4.41.2 将glm-4-9b-chat-1m模型的modeling_chatglm.py拷贝替换glm-4-9b-chat的modeling_chatglm.py。 如果出现ValueError: too many values to unpack (expected 2),可以参考https://huggingface.co/THUDM/glm-4-9b-chat/discussions/58
替换后对话测试弹出错误:generationmixin._get_logits_warper() missing 1 required positional argument:'device'
同问,求解
transformers==4.41.2 将glm-4-9b-chat-1m模型的modeling_chatglm.py拷贝替换glm-4-9b-chat的modeling_chatglm.py。 如果出现ValueError: too many values to unpack (expected 2),可以参考https://huggingface.co/THUDM/glm-4-9b-chat/discussions/58
替换后对话测试弹出错误:generationmixin._get_logits_warper() missing 1 required positional argument:'device'
@chinacqzgp 继续降到4.36
同问,求解
transformers==4.41.2 将glm-4-9b-chat-1m模型的modeling_chatglm.py拷贝替换glm-4-9b-chat的modeling_chatglm.py。 如果出现ValueError: too many values to unpack (expected 2),可以参考https://huggingface.co/THUDM/glm-4-9b-chat/discussions/58
这个方案可行,折腾了好久,终于搞定了,感谢!
同问,求解
transformers==4.41.2 将glm-4-9b-chat-1m模型的modeling_chatglm.py拷贝替换glm-4-9b-chat的modeling_chatglm.py。 如果出现ValueError: too many values to unpack (expected 2),可以参考https://huggingface.co/THUDM/glm-4-9b-chat/discussions/58
这个方案可行,折腾了好久,终于搞定了,感谢!
能work就行,刚才我又重新部署了一下,还是能正常运行。
过程如下:
pip install "xinference[all]"
pip install "xinference[transformers]"
pip install "xinference[vllm]"
pip install tiktoken
pip install sentence-transformers
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/
用的显卡是L20
用的是基础的glm4-9b-chat模型,是git clone huggingface里面的
git clone https://github.com/hiyouga/LLaMA-Factory.git
然后lfs pull这样
System Info / 系統信息
python 3.11.8
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
Xinference-v 0.13.3
The command used to start Xinference / 用以启动 xinference 的命令
xinference-local --host 0.0.0.0 --port 9997
Reproduction / 复现过程
1.启动模型 2.对话报错
Expected behavior / 期待表现
希望能正常推理