xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5k stars 397 forks source link

Chat completion stream got an error: _get_logits_warper() missing 1 required positional argument: 'device' #1752

Open huangl22 opened 3 months ago

huangl22 commented 3 months ago

2024-07-01 13:45:50,319 xinference.api.restful_api 24041 ERROR Chat completion stream got an error: [address=127.0.0.1:46761, pid=24313] _get_logits_warper() missing 1 required positional argument: 'device' Traceback (most recent call last): File "/home/hl/miniconda3/envs/xinference/lib/python3.8/site-packages/xinference/api/restful_api.py", line 1537, in stream_results async for item in iterator: File "/home/hl/miniconda3/envs/xinference/lib/python3.8/site-packages/xoscar/api.py", line 340, in anext return await self._actor_ref.xoscar_next(self._uid) File "/home/hl/miniconda3/envs/xinference/lib/python3.8/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/home/hl/miniconda3/envs/xinference/lib/python3.8/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/home/hl/miniconda3/envs/xinference/lib/python3.8/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/home/hl/miniconda3/envs/xinference/lib/python3.8/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/home/hl/miniconda3/envs/xinference/lib/python3.8/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive result = await result File "/home/hl/miniconda3/envs/xinference/lib/python3.8/site-packages/xoscar/api.py", line 431, in xoscar_next raise e File "/home/hl/miniconda3/envs/xinference/lib/python3.8/site-packages/xoscar/api.py", line 417, in __xoscar_next__ r = await asyncio.to_thread(_wrapper, gen) File "/home/hl/miniconda3/envs/xinference/lib/python3.8/site-packages/xoscar/aio/_threads.py", line 35, in to_thread return await loop.run_in_executor(None, func_call) File "/home/hl/miniconda3/envs/xinference/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, self.kwargs) File "/home/hl/miniconda3/envs/xinference/lib/python3.8/site-packages/xoscar/api.py", line 402, in _wrapper return next(_gen) File "/home/hl/miniconda3/envs/xinference/lib/python3.8/site-packages/xinference/core/model.py", line 301, in _to_json_generator for v in gen: File "/home/hl/miniconda3/envs/xinference/lib/python3.8/site-packages/xinference/model/llm/utils.py", line 553, in _to_chat_completion_chunks for i, chunk in enumerate(chunks): File "/home/hl/miniconda3/envs/xinference/lib/python3.8/site-packages/xinference/model/llm/pytorch/chatglm.py", line 258, in _stream_generator for chunktext, in self._model.stream_chat( File "/home/hl/miniconda3/envs/xinference/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/home/hl/.cache/huggingface/modules/transformers_modules/glm-4-9b-chat/modeling_chatglm.py", line 1012, in stream_chat for outputs in self.stream_generate(inputs, past_key_values=past_key_values, File "/home/hl/miniconda3/envs/xinference/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/home/hl/.cache/huggingface/modules/transformers_modules/glm-4-9b-chat/modeling_chatglm.py", line 1092, in stream_generate logits_warper = self._get_logits_warper(generation_config) TypeError: [address=127.0.0.1:46761, pid=24313] _get_logits_warper() missing 1 required positional argument: 'device'

qinxuye commented 3 months ago

This is due to the upgrade of transformers, for workaround, please downgrade transformers.

pip install 'transformers==4.41.2'
LULU-LULULA commented 3 months ago

This is due to the upgrade of transformers, for workaround, please downgrade transformers.

pip install 'transformers==4.41.2'

Thank you very much for your advice!

xxch commented 2 months ago

transformers==4.42.3 也会有这个错误。建议不要升级到该版本

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 7 days with no activity.