xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
4.46k stars 351 forks source link

xinference 后端驱动chatglm4-9b-chat,接入到dify/lobe-chat时出错,不能正常回答 #1746

Open knightcn1983 opened 1 month ago

knightcn1983 commented 1 month ago

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version. /python3.11
  2. The version of xinference you use. xinfinence==0.12.3
  3. Full stack of the error. 2024-06-29 22:47:51,612 xinference.api.restful_api 18792 ERROR Chat completion stream got an error: [address=127.0.0.1:55110, pid=18927] GenerationMixin._get_logits_warper() missing 1 required positional argument: 'device' Traceback (most recent call last): File "/opt/anaconda3/envs/xinference/lib/python3.11/site-packages/xinference/api/restful_api.py", line 1537, in stream_results async for item in iterator: File "/opt/anaconda3/envs/xinference/lib/python3.11/site-packages/xoscar/api.py", line 340, in anext return await self._actor_ref.xoscar_next(self._uid) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/xinference/lib/python3.11/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/xinference/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/opt/anaconda3/envs/xinference/lib/python3.11/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) ^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/xinference/lib/python3.11/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/opt/anaconda3/envs/xinference/lib/python3.11/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore ^^^^^^^^^^^^^^^^^ File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: ^^^^^^^^^^^^^^^^^ File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', ^^^^^^^^^^^^^^^^^ File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive result = await result ^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/xinference/lib/python3.11/site-packages/xoscar/api.py", line 431, in xoscar_next raise e File "/opt/anaconda3/envs/xinference/lib/python3.11/site-packages/xoscar/api.py", line 417, in __xoscar_next__ r = await asyncio.to_thread(_wrapper, gen) ^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/xinference/lib/python3.11/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) ^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/xinference/lib/python3.11/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, self.kwargs) ^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/xinference/lib/python3.11/site-packages/xoscar/api.py", line 402, in _wrapper return next(_gen) File "/opt/anaconda3/envs/xinference/lib/python3.11/site-packages/xinference/core/model.py", line 301, in _to_json_generator for v in gen: File "/opt/anaconda3/envs/xinference/lib/python3.11/site-packages/xinference/model/llm/utils.py", line 553, in _to_chat_completion_chunks for i, chunk in enumerate(chunks): ^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/xinference/lib/python3.11/site-packages/xinference/model/llm/pytorch/chatglm.py", line 258, in _stream_generator for chunktext, in self._model.stream_chat( ^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/xinference/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) ^^^^^^^^^^^^^^^^^ File "/Users/qytian/.cache/huggingface/modules/transformers_modules/glm-4-9b-chat-hf-pytorch-9b/modeling_chatglm.py", line 1013, in stream_chat for outputs in self.stream_generate(inputs, past_key_values=past_key_values, ^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/xinference/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) ^^^^^^^^^^^^^^^^^ File "/Users/qytian/.cache/huggingface/modules/transformers_modules/glm-4-9b-chat-hf-pytorch-9b/modeling_chatglm.py", line 1093, in stream_generate logits_warper = self._get_logits_warper(generation_config) ^^^^^^^^^^^^^^^^^ TypeError: [address=127.0.0.1:55110, pid=18927] GenerationMixin._get_logits_warper() missing 1 required positional argument: 'device'

Expected behavior

后端采用xinference驱动chatglm4-9b-chat,前端采用dify或者lobe-chat,不能正常获得答案,提示处理流时出错

Additional context

Add any other context about the problem here.

huangl22 commented 1 month ago

我也遇到了这个问题,请问解决了吗?

Jinju-Sun commented 1 month ago

我也遇到了同样的问题,蹲一个解决办法

qinxuye commented 1 month ago

This is due to the upgrade of transformers, for workaround, please downgrade transformers.

pip install 'transformers==4.41.2'
xxch commented 1 month ago

我也切换到了pip install 'transformers==4.41.2' inference==0.13.0 python=3.11 虽然不报上面的错误了,页面上问答也正常。但是报错 ` --- Logging error --- Traceback (most recent call last): File "/app/miniconda/envs/inference/lib/python3.11/logging/handlers.py", line 73, in emit if self.shouldRollover(record): ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/miniconda/envs/inference/lib/python3.11/logging/handlers.py", line 196, in shouldRollover msg = "%s\n" % self.format(record) ^^^^^^^^^^^^^^^^^^^ File "/app/miniconda/envs/inference/lib/python3.11/logging/init.py", line 953, in format return fmt.format(record) ^^^^^^^^^^^^^^^^^^ File "/app/miniconda/envs/inference/lib/python3.11/logging/init.py", line 687, in format record.message = record.getMessage() ^^^^^^^^^^^^^^^^^^^ File "/app/miniconda/envs/inference/lib/python3.11/logging/init.py", line 377, in getMessage msg = msg % self.args


TypeError: not all arguments converted during string formatting
Call stack:
  File "/app/miniconda/envs/inference/lib/python3.11/threading.py", line 1002, in _bootstrap
    self._bootstrap_inner()
  File "/app/miniconda/envs/inference/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/app/miniconda/envs/inference/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/app/miniconda/envs/inference/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker
    work_item.run()
  File "/app/miniconda/envs/inference/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/api.py", line 402, in _wrapper
    return next(_gen)
  File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/core/model.py", line 318, in _to_json_generator
    for v in gen:
  File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/model/llm/utils.py", line 558, in _to_chat_completion_chunks
    for i, chunk in enumerate(chunks):
  File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/model/llm/pytorch/chatglm.py", line 259, in _stream_generator
    for chunk_text, _ in self._model.stream_chat(
  File "/app/miniconda/envs/inference/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/home/resoft/.cache/huggingface/modules/transformers_modules/glm-4-9b-chat/modeling_chatglm.py", line 1012, in stream_chat
    for outputs in self.stream_generate(**inputs, past_key_values=past_key_values,
  File "/app/miniconda/envs/inference/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/home/resoft/.cache/huggingface/modules/transformers_modules/glm-4-9b-chat/modeling_chatglm.py", line 1061, in stream_generate
    logger.warn(
Message: 'Both `max_new_tokens` (=512) and `max_length`(=518) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)'
Arguments: (<class 'UserWarning'>,)
`
xxch commented 1 month ago

glm8 glm4使用的transformers 版本比较低,其他模型也是各不相同。这可咋办

firrice commented 4 weeks ago

我也切换到了pip install 'transformers==4.41.2' inference==0.13.0 python=3.11 虽然不报上面的错误了,页面上问答也正常。但是报错 --- Logging error --- Traceback (most recent call last): File "/app/miniconda/envs/inference/lib/python3.11/logging/handlers.py", line 73, in emit if self.shouldRollover(record): ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/miniconda/envs/inference/lib/python3.11/logging/handlers.py", line 196, in shouldRollover msg = "%s\n" % self.format(record) ^^^^^^^^^^^^^^^^^^^ File "/app/miniconda/envs/inference/lib/python3.11/logging/__init__.py", line 953, in format return fmt.format(record) ^^^^^^^^^^^^^^^^^^ File "/app/miniconda/envs/inference/lib/python3.11/logging/__init__.py", line 687, in format record.message = record.getMessage() ^^^^^^^^^^^^^^^^^^^ File "/app/miniconda/envs/inference/lib/python3.11/logging/__init__.py", line 377, in getMessage msg = msg % self.args ~~~~^~~~~~~~~~~ TypeError: not all arguments converted during string formatting Call stack: File "/app/miniconda/envs/inference/lib/python3.11/threading.py", line 1002, in _bootstrap self._bootstrap_inner() File "/app/miniconda/envs/inference/lib/python3.11/threading.py", line 1045, in _bootstrap_inner self.run() File "/app/miniconda/envs/inference/lib/python3.11/threading.py", line 982, in run self._target(*self._args, **self._kwargs) File "/app/miniconda/envs/inference/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker work_item.run() File "/app/miniconda/envs/inference/lib/python3.11/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/api.py", line 402, in _wrapper return next(_gen) File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/core/model.py", line 318, in _to_json_generator for v in gen: File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/model/llm/utils.py", line 558, in _to_chat_completion_chunks for i, chunk in enumerate(chunks): File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/model/llm/pytorch/chatglm.py", line 259, in _stream_generator for chunk_text, _ in self._model.stream_chat( File "/app/miniconda/envs/inference/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/home/resoft/.cache/huggingface/modules/transformers_modules/glm-4-9b-chat/modeling_chatglm.py", line 1012, in stream_chat for outputs in self.stream_generate(**inputs, past_key_values=past_key_values, File "/app/miniconda/envs/inference/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/home/resoft/.cache/huggingface/modules/transformers_modules/glm-4-9b-chat/modeling_chatglm.py", line 1061, in stream_generate logger.warn( Message: 'Bothmax_new_tokens(=512) andmax_length(=518) seem to have been set.max_new_tokenswill take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)' Arguments: (<class 'UserWarning'>,)

我也是同样的错误,TypeError: not all arguments converted during string formatting

koko426 commented 2 weeks ago

同样的错误,TypeError: not all arguments converted during string formatting 'transformers==4.41.2'