xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.33k stars 431 forks source link

能否支持longwriter,我目前使用自定义有报错 #2322

Closed xs818818 closed 1 month ago

xs818818 commented 1 month ago

Feature request / 功能建议

2024-09-18 02:22:06,994 xinference.core.worker 68 INFO [request 690d9782-759f-11ef-af77-0242ac110002] Leave launch_builtin_model, elapsed time: 27 s 2024-09-18 02:22:46,824 xinference.core.model 87 ERROR [request 91347b22-759f-11ef-9dbb-0242ac110002] Leave chat, error: 151331, elapsed time: 0 s Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 69, in wrapped ret = await func(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 560, in chat response = await self._call_wrapper_json( File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 407, in _call_wrapper_json return await self._call_wrapper("json", fn, *args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 120, in _async_wrapper return await fn(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 418, in _call_wrapper ret = await asyncio.to_thread(fn, *args, kwargs) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/chatglm.py", line 358, in chat inputs = self._tokenizer.apply_chat_template( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1857, in apply_chat_template out = self( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3055, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3163, in _call_one return self.encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3237, in encode_plus return self._encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 797, in _encode_plus first_ids = get_input_ids(text) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 764, in get_input_ids tokens = self.tokenize(text, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 695, in tokenize tokenized_text.extend(self._tokenize(token)) File "/root/.cache/huggingface/modules/transformers_modules/long-llm-pytorch-8b/tokenization_chatglm.py", line 109, in _tokenize tokens.append(self.decoder[t]) KeyError: 151331 2024-09-18 02:22:46,833 xinference.api.restful_api 1 ERROR Chat completion stream got an error: [address=0.0.0.0:34167, pid=87] 151331 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 1891, in stream_results iterator = await model.chat( File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 656, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 367, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive__ raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 96, in wrapped_func ret = await fn(self, args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 462, in _wrapper r = await func(self, *args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 69, in wrapped ret = await func(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 560, in chat response = await self._call_wrapper_json( File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 407, in _call_wrapper_json return await self._call_wrapper("json", fn, args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 120, in _async_wrapper return await fn(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 418, in _call_wrapper ret = await asyncio.to_thread(fn, *args, *kwargs) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(self.args, self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/chatglm.py", line 358, in chat inputs = self._tokenizer.apply_chat_template( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1857, in apply_chat_template out = self( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3055, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3163, in _call_one return self.encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3237, in encode_plus return self._encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 797, in _encode_plus first_ids = get_input_ids(text) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 764, in get_input_ids tokens = self.tokenize(text, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 695, in tokenize tokenized_text.extend(self._tokenize(token)) File "/root/.cache/huggingface/modules/transformers_modules/long-llm-pytorch-8b/tokenization_chatglm.py", line 109, in _tokenize tokens.append(self.decoder[t]) KeyError: [address=0.0.0.0:34167, pid=87] 151331 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 527, in process_events response = await route_utils.call_process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 261, in call_process_api output = await app.get_blocks().process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1786, in process_api result = await self.call_function( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1350, in call_function prediction = await utils.async_iteration(iterator) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 583, in async_iteration return await iterator.anext() File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 709, in asyncgen_wrapper response = await iterator.anext() File "/usr/local/lib/python3.10/dist-packages/gradio/chat_interface.py", line 545, in _stream_fn first_response = await async_iteration(generator) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 583, in async_iteration return await iterator.anext() File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 576, in anext return await anyio.to_thread.run_sync( File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, args) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 559, in run_sync_iterator_async return next(iterator) File "/usr/local/lib/python3.10/dist-packages/xinference/core/chat_interface.py", line 122, in generate_wrapper for chunk in model.chat( File "/usr/local/lib/python3.10/dist-packages/xinference/client/common.py", line 51, in streaming_response_iterator raise Exception(str(error)) Exception: [address=0.0.0.0:34167, pid=87] 151331 2024-09-18 02:24:17,064 xinference.core.model 87 ERROR [request c6ff327e-759f-11ef-9dbb-0242ac110002] Leave chat, error: 151331, elapsed time: 0 s Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 69, in wrapped ret = await func(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 560, in chat response = await self._call_wrapper_json( File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 407, in _call_wrapper_json return await self._call_wrapper("json", fn, *args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 120, in _async_wrapper return await fn(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 418, in _call_wrapper ret = await asyncio.to_thread(fn, *args, kwargs) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/chatglm.py", line 358, in chat inputs = self._tokenizer.apply_chat_template( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1857, in apply_chat_template out = self( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3055, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3163, in _call_one return self.encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3237, in encode_plus return self._encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 797, in _encode_plus first_ids = get_input_ids(text) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 764, in get_input_ids tokens = self.tokenize(text, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 695, in tokenize tokenized_text.extend(self._tokenize(token)) File "/root/.cache/huggingface/modules/transformers_modules/long-llm-pytorch-8b/tokenization_chatglm.py", line 109, in _tokenize tokens.append(self.decoder[t]) KeyError: 151331 2024-09-18 02:24:17,068 xinference.api.restful_api 1 ERROR Chat completion stream got an error: [address=0.0.0.0:34167, pid=87] 151331 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 1891, in stream_results iterator = await model.chat( File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 656, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 367, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive__ raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 96, in wrapped_func ret = await fn(self, args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 462, in _wrapper r = await func(self, *args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 69, in wrapped ret = await func(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 560, in chat response = await self._call_wrapper_json( File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 407, in _call_wrapper_json return await self._call_wrapper("json", fn, args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 120, in _async_wrapper return await fn(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 418, in _call_wrapper ret = await asyncio.to_thread(fn, *args, *kwargs) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(self.args, self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/chatglm.py", line 358, in chat inputs = self._tokenizer.apply_chat_template( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1857, in apply_chat_template out = self( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3055, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3163, in _call_one return self.encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3237, in encode_plus return self._encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 797, in _encode_plus first_ids = get_input_ids(text) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 764, in get_input_ids tokens = self.tokenize(text, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 695, in tokenize tokenized_text.extend(self._tokenize(token)) File "/root/.cache/huggingface/modules/transformers_modules/long-llm-pytorch-8b/tokenization_chatglm.py", line 109, in _tokenize tokens.append(self.decoder[t]) KeyError: [address=0.0.0.0:34167, pid=87] 151331 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 527, in process_events response = await route_utils.call_process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 261, in call_process_api output = await app.get_blocks().process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1786, in process_api result = await self.call_function( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1350, in call_function prediction = await utils.async_iteration(iterator) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 583, in async_iteration return await iterator.anext() File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 709, in asyncgen_wrapper response = await iterator.anext() File "/usr/local/lib/python3.10/dist-packages/gradio/chat_interface.py", line 545, in _stream_fn first_response = await async_iteration(generator) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 583, in async_iteration return await iterator.anext() File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 576, in anext return await anyio.to_thread.run_sync( File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, *args) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 559, in run_sync_iterator_async return next(iterator) File "/usr/local/lib/python3.10/dist-packages/xinference/core/chat_interface.py", line 122, in generate_wrapper for chunk in model.chat( File "/usr/local/lib/python3.10/dist-packages/xinference/client/common.py", line 51, in streaming_response_iterator raise Exception(str(error)) Exception: [address=0.0.0.0:34167, pid=87] 151331

Motivation / 动机

2024-09-18 02:22:06,994 xinference.core.worker 68 INFO [request 690d9782-759f-11ef-af77-0242ac110002] Leave launch_builtin_model, elapsed time: 27 s 2024-09-18 02:22:46,824 xinference.core.model 87 ERROR [request 91347b22-759f-11ef-9dbb-0242ac110002] Leave chat, error: 151331, elapsed time: 0 s Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 69, in wrapped ret = await func(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 560, in chat response = await self._call_wrapper_json( File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 407, in _call_wrapper_json return await self._call_wrapper("json", fn, *args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 120, in _async_wrapper return await fn(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 418, in _call_wrapper ret = await asyncio.to_thread(fn, *args, kwargs) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/chatglm.py", line 358, in chat inputs = self._tokenizer.apply_chat_template( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1857, in apply_chat_template out = self( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3055, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3163, in _call_one return self.encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3237, in encode_plus return self._encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 797, in _encode_plus first_ids = get_input_ids(text) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 764, in get_input_ids tokens = self.tokenize(text, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 695, in tokenize tokenized_text.extend(self._tokenize(token)) File "/root/.cache/huggingface/modules/transformers_modules/long-llm-pytorch-8b/tokenization_chatglm.py", line 109, in _tokenize tokens.append(self.decoder[t]) KeyError: 151331 2024-09-18 02:22:46,833 xinference.api.restful_api 1 ERROR Chat completion stream got an error: [address=0.0.0.0:34167, pid=87] 151331 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 1891, in stream_results iterator = await model.chat( File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 656, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 367, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive__ raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 96, in wrapped_func ret = await fn(self, args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 462, in _wrapper r = await func(self, *args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 69, in wrapped ret = await func(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 560, in chat response = await self._call_wrapper_json( File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 407, in _call_wrapper_json return await self._call_wrapper("json", fn, args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 120, in _async_wrapper return await fn(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 418, in _call_wrapper ret = await asyncio.to_thread(fn, *args, *kwargs) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(self.args, self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/chatglm.py", line 358, in chat inputs = self._tokenizer.apply_chat_template( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1857, in apply_chat_template out = self( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3055, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3163, in _call_one return self.encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3237, in encode_plus return self._encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 797, in _encode_plus first_ids = get_input_ids(text) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 764, in get_input_ids tokens = self.tokenize(text, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 695, in tokenize tokenized_text.extend(self._tokenize(token)) File "/root/.cache/huggingface/modules/transformers_modules/long-llm-pytorch-8b/tokenization_chatglm.py", line 109, in _tokenize tokens.append(self.decoder[t]) KeyError: [address=0.0.0.0:34167, pid=87] 151331 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 527, in process_events response = await route_utils.call_process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 261, in call_process_api output = await app.get_blocks().process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1786, in process_api result = await self.call_function( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1350, in call_function prediction = await utils.async_iteration(iterator) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 583, in async_iteration return await iterator.anext() File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 709, in asyncgen_wrapper response = await iterator.anext() File "/usr/local/lib/python3.10/dist-packages/gradio/chat_interface.py", line 545, in _stream_fn first_response = await async_iteration(generator) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 583, in async_iteration return await iterator.anext() File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 576, in anext return await anyio.to_thread.run_sync( File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, args) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 559, in run_sync_iterator_async return next(iterator) File "/usr/local/lib/python3.10/dist-packages/xinference/core/chat_interface.py", line 122, in generate_wrapper for chunk in model.chat( File "/usr/local/lib/python3.10/dist-packages/xinference/client/common.py", line 51, in streaming_response_iterator raise Exception(str(error)) Exception: [address=0.0.0.0:34167, pid=87] 151331 2024-09-18 02:24:17,064 xinference.core.model 87 ERROR [request c6ff327e-759f-11ef-9dbb-0242ac110002] Leave chat, error: 151331, elapsed time: 0 s Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 69, in wrapped ret = await func(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 560, in chat response = await self._call_wrapper_json( File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 407, in _call_wrapper_json return await self._call_wrapper("json", fn, *args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 120, in _async_wrapper return await fn(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 418, in _call_wrapper ret = await asyncio.to_thread(fn, *args, kwargs) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/chatglm.py", line 358, in chat inputs = self._tokenizer.apply_chat_template( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1857, in apply_chat_template out = self( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3055, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3163, in _call_one return self.encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3237, in encode_plus return self._encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 797, in _encode_plus first_ids = get_input_ids(text) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 764, in get_input_ids tokens = self.tokenize(text, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 695, in tokenize tokenized_text.extend(self._tokenize(token)) File "/root/.cache/huggingface/modules/transformers_modules/long-llm-pytorch-8b/tokenization_chatglm.py", line 109, in _tokenize tokens.append(self.decoder[t]) KeyError: 151331 2024-09-18 02:24:17,068 xinference.api.restful_api 1 ERROR Chat completion stream got an error: [address=0.0.0.0:34167, pid=87] 151331 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 1891, in stream_results iterator = await model.chat( File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 656, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 367, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive__ raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 96, in wrapped_func ret = await fn(self, args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 462, in _wrapper r = await func(self, *args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 69, in wrapped ret = await func(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 560, in chat response = await self._call_wrapper_json( File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 407, in _call_wrapper_json return await self._call_wrapper("json", fn, args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 120, in _async_wrapper return await fn(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 418, in _call_wrapper ret = await asyncio.to_thread(fn, *args, *kwargs) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(self.args, self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/chatglm.py", line 358, in chat inputs = self._tokenizer.apply_chat_template( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1857, in apply_chat_template out = self( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3055, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3163, in _call_one return self.encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3237, in encode_plus return self._encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 797, in _encode_plus first_ids = get_input_ids(text) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 764, in get_input_ids tokens = self.tokenize(text, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 695, in tokenize tokenized_text.extend(self._tokenize(token)) File "/root/.cache/huggingface/modules/transformers_modules/long-llm-pytorch-8b/tokenization_chatglm.py", line 109, in _tokenize tokens.append(self.decoder[t]) KeyError: [address=0.0.0.0:34167, pid=87] 151331 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 527, in process_events response = await route_utils.call_process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 261, in call_process_api output = await app.get_blocks().process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1786, in process_api result = await self.call_function( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1350, in call_function prediction = await utils.async_iteration(iterator) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 583, in async_iteration return await iterator.anext() File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 709, in asyncgen_wrapper response = await iterator.anext() File "/usr/local/lib/python3.10/dist-packages/gradio/chat_interface.py", line 545, in _stream_fn first_response = await async_iteration(generator) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 583, in async_iteration return await iterator.anext() File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 576, in anext return await anyio.to_thread.run_sync( File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, *args) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 559, in run_sync_iterator_async return next(iterator) File "/usr/local/lib/python3.10/dist-packages/xinference/core/chat_interface.py", line 122, in generate_wrapper for chunk in model.chat( File "/usr/local/lib/python3.10/dist-packages/xinference/client/common.py", line 51, in streaming_response_iterator raise Exception(str(error)) Exception: [address=0.0.0.0:34167, pid=87] 151331

Your contribution / 您的贡献

2024-09-18 02:22:06,994 xinference.core.worker 68 INFO [request 690d9782-759f-11ef-af77-0242ac110002] Leave launch_builtin_model, elapsed time: 27 s 2024-09-18 02:22:46,824 xinference.core.model 87 ERROR [request 91347b22-759f-11ef-9dbb-0242ac110002] Leave chat, error: 151331, elapsed time: 0 s Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 69, in wrapped ret = await func(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 560, in chat response = await self._call_wrapper_json( File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 407, in _call_wrapper_json return await self._call_wrapper("json", fn, *args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 120, in _async_wrapper return await fn(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 418, in _call_wrapper ret = await asyncio.to_thread(fn, *args, kwargs) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/chatglm.py", line 358, in chat inputs = self._tokenizer.apply_chat_template( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1857, in apply_chat_template out = self( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3055, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3163, in _call_one return self.encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3237, in encode_plus return self._encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 797, in _encode_plus first_ids = get_input_ids(text) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 764, in get_input_ids tokens = self.tokenize(text, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 695, in tokenize tokenized_text.extend(self._tokenize(token)) File "/root/.cache/huggingface/modules/transformers_modules/long-llm-pytorch-8b/tokenization_chatglm.py", line 109, in _tokenize tokens.append(self.decoder[t]) KeyError: 151331 2024-09-18 02:22:46,833 xinference.api.restful_api 1 ERROR Chat completion stream got an error: [address=0.0.0.0:34167, pid=87] 151331 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 1891, in stream_results iterator = await model.chat( File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 656, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 367, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive__ raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 96, in wrapped_func ret = await fn(self, args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 462, in _wrapper r = await func(self, *args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 69, in wrapped ret = await func(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 560, in chat response = await self._call_wrapper_json( File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 407, in _call_wrapper_json return await self._call_wrapper("json", fn, args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 120, in _async_wrapper return await fn(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 418, in _call_wrapper ret = await asyncio.to_thread(fn, *args, *kwargs) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(self.args, self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/chatglm.py", line 358, in chat inputs = self._tokenizer.apply_chat_template( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1857, in apply_chat_template out = self( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3055, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3163, in _call_one return self.encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3237, in encode_plus return self._encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 797, in _encode_plus first_ids = get_input_ids(text) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 764, in get_input_ids tokens = self.tokenize(text, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 695, in tokenize tokenized_text.extend(self._tokenize(token)) File "/root/.cache/huggingface/modules/transformers_modules/long-llm-pytorch-8b/tokenization_chatglm.py", line 109, in _tokenize tokens.append(self.decoder[t]) KeyError: [address=0.0.0.0:34167, pid=87] 151331 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 527, in process_events response = await route_utils.call_process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 261, in call_process_api output = await app.get_blocks().process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1786, in process_api result = await self.call_function( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1350, in call_function prediction = await utils.async_iteration(iterator) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 583, in async_iteration return await iterator.anext() File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 709, in asyncgen_wrapper response = await iterator.anext() File "/usr/local/lib/python3.10/dist-packages/gradio/chat_interface.py", line 545, in _stream_fn first_response = await async_iteration(generator) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 583, in async_iteration return await iterator.anext() File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 576, in anext return await anyio.to_thread.run_sync( File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, args) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 559, in run_sync_iterator_async return next(iterator) File "/usr/local/lib/python3.10/dist-packages/xinference/core/chat_interface.py", line 122, in generate_wrapper for chunk in model.chat( File "/usr/local/lib/python3.10/dist-packages/xinference/client/common.py", line 51, in streaming_response_iterator raise Exception(str(error)) Exception: [address=0.0.0.0:34167, pid=87] 151331 2024-09-18 02:24:17,064 xinference.core.model 87 ERROR [request c6ff327e-759f-11ef-9dbb-0242ac110002] Leave chat, error: 151331, elapsed time: 0 s Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 69, in wrapped ret = await func(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 560, in chat response = await self._call_wrapper_json( File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 407, in _call_wrapper_json return await self._call_wrapper("json", fn, *args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 120, in _async_wrapper return await fn(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 418, in _call_wrapper ret = await asyncio.to_thread(fn, *args, kwargs) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/chatglm.py", line 358, in chat inputs = self._tokenizer.apply_chat_template( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1857, in apply_chat_template out = self( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3055, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3163, in _call_one return self.encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3237, in encode_plus return self._encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 797, in _encode_plus first_ids = get_input_ids(text) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 764, in get_input_ids tokens = self.tokenize(text, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 695, in tokenize tokenized_text.extend(self._tokenize(token)) File "/root/.cache/huggingface/modules/transformers_modules/long-llm-pytorch-8b/tokenization_chatglm.py", line 109, in _tokenize tokens.append(self.decoder[t]) KeyError: 151331 2024-09-18 02:24:17,068 xinference.api.restful_api 1 ERROR Chat completion stream got an error: [address=0.0.0.0:34167, pid=87] 151331 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 1891, in stream_results iterator = await model.chat( File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 656, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 367, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive__ raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 96, in wrapped_func ret = await fn(self, args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 462, in _wrapper r = await func(self, *args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 69, in wrapped ret = await func(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 560, in chat response = await self._call_wrapper_json( File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 407, in _call_wrapper_json return await self._call_wrapper("json", fn, args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 120, in _async_wrapper return await fn(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 418, in _call_wrapper ret = await asyncio.to_thread(fn, *args, *kwargs) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(self.args, self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/chatglm.py", line 358, in chat inputs = self._tokenizer.apply_chat_template( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1857, in apply_chat_template out = self( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3055, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3163, in _call_one return self.encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3237, in encode_plus return self._encode_plus( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 797, in _encode_plus first_ids = get_input_ids(text) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 764, in get_input_ids tokens = self.tokenize(text, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 695, in tokenize tokenized_text.extend(self._tokenize(token)) File "/root/.cache/huggingface/modules/transformers_modules/long-llm-pytorch-8b/tokenization_chatglm.py", line 109, in _tokenize tokens.append(self.decoder[t]) KeyError: [address=0.0.0.0:34167, pid=87] 151331 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 527, in process_events response = await route_utils.call_process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 261, in call_process_api output = await app.get_blocks().process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1786, in process_api result = await self.call_function( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1350, in call_function prediction = await utils.async_iteration(iterator) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 583, in async_iteration return await iterator.anext() File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 709, in asyncgen_wrapper response = await iterator.anext() File "/usr/local/lib/python3.10/dist-packages/gradio/chat_interface.py", line 545, in _stream_fn first_response = await async_iteration(generator) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 583, in async_iteration return await iterator.anext() File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 576, in anext return await anyio.to_thread.run_sync( File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, *args) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 559, in run_sync_iterator_async return next(iterator) File "/usr/local/lib/python3.10/dist-packages/xinference/core/chat_interface.py", line 122, in generate_wrapper for chunk in model.chat( File "/usr/local/lib/python3.10/dist-packages/xinference/client/common.py", line 51, in streaming_response_iterator raise Exception(str(error)) Exception: [address=0.0.0.0:34167, pid=87] 151331

YiLin198 commented 1 month ago

LongCite-glm4-9b-chat具有相同的问题:

2024-09-19 10:23:28,155 transformers.generation.configuration_utils 2375 INFO loading configuration file /root/autodl-tmp/LongCite-glm4-9b/generation_config.json 2024-09-19 10:23:28,155 transformers.generation.configuration_utils 2375 INFO Generate config GenerationConfig { "do_sample": true, "eos_token_id": [ 151329, 151336, 151338 ], "max_length": 128000, "pad_token_id": 151329, "temperature": 0.8, "top_p": 0.8 }

2024-09-19 10:23:28,164 xinference.core.worker 2351 INFO [request 20dcb49a-762e-11ef-9541-0242ac110009] Leave launch_builtin_model, elapsed time: 11 s 2024-09-19 10:23:37,409 xinference.core.model 2375 ERROR [request 2d5ec7b2-762e-11ef-80fe-0242ac110009] Leave chat, error: 151331, elapsed time: 0 s Traceback (most recent call last): File "/root/miniconda3/lib/python3.8/site-packages/xinference/core/utils.py", line 69, in wrapped ret = await func(*args, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/xinference/core/model.py", line 560, in chat response = await self._call_wrapper_json( File "/root/miniconda3/lib/python3.8/site-packages/xinference/core/model.py", line 407, in _call_wrapper_json return await self._call_wrapper("json", fn, *args, *kwargs) File "/root/miniconda3/lib/python3.8/site-packages/xinference/core/model.py", line 120, in _async_wrapper return await fn(args, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/xinference/core/model.py", line 418, in _call_wrapper ret = await asyncio.to_thread(fn, *args, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/xoscar/aio/_threads.py", line 35, in to_thread return await loop.run_in_executor(None, func_call) File "/root/miniconda3/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, self.kwargs) File "/root/miniconda3/lib/python3.8/site-packages/xinference/model/llm/transformers/chatglm.py", line 358, in chat inputs = self._tokenizer.apply_chat_template( File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1857, in apply_chat_template out = self( File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 3055, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 3163, in _call_one return self.encode_plus( File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 3237, in encode_plus return self._encode_plus( File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 797, in _encode_plus first_ids = get_input_ids(text) File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 764, in get_input_ids tokens = self.tokenize(text, *kwargs) File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 695, in tokenize tokenized_text.extend(self._tokenize(token)) File "/root/.cache/huggingface/modules/transformers_modules/LongCite-glm4-9b/tokenization_chatglm.py", line 109, in _tokenize tokens.append(self.decoder[t]) KeyError: 151331 2024-09-19 10:23:37,418 xinference.api.restful_api 2315 ERROR Chat completion stream got an error: [address=0.0.0.0:43101, pid=2375] 151331 Traceback (most recent call last): File "/root/miniconda3/lib/python3.8/site-packages/xinference/api/restful_api.py", line 1891, in stream_results iterator = await model.chat( File "/root/miniconda3/lib/python3.8/site-packages/xoscar/backends/context.py", line 231, in send return self._process_result_message(result) File "/root/miniconda3/lib/python3.8/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/miniconda3/lib/python3.8/site-packages/xoscar/backends/pool.py", line 656, in send result = await self._run_coro(message.message_id, coro) File "/root/miniconda3/lib/python3.8/site-packages/xoscar/backends/pool.py", line 367, in _run_coro return await coro File "/root/miniconda3/lib/python3.8/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive__ raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive result = await result File "/root/miniconda3/lib/python3.8/site-packages/xinference/core/model.py", line 96, in wrapped_func ret = await fn(self, args, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/xoscar/api.py", line 462, in _wrapper r = await func(self, *args, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/xinference/core/utils.py", line 69, in wrapped ret = await func(*args, *kwargs) File "/root/miniconda3/lib/python3.8/site-packages/xinference/core/model.py", line 560, in chat response = await self._call_wrapper_json( File "/root/miniconda3/lib/python3.8/site-packages/xinference/core/model.py", line 407, in _call_wrapper_json return await self._call_wrapper("json", fn, args, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/xinference/core/model.py", line 120, in _async_wrapper return await fn(*args, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/xinference/core/model.py", line 418, in _call_wrapper ret = await asyncio.to_thread(fn, *args, *kwargs) File "/root/miniconda3/lib/python3.8/site-packages/xoscar/aio/_threads.py", line 35, in to_thread return await loop.run_in_executor(None, func_call) File "/root/miniconda3/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(self.args, self.kwargs) File "/root/miniconda3/lib/python3.8/site-packages/xinference/model/llm/transformers/chatglm.py", line 358, in chat inputs = self._tokenizer.apply_chat_template( File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1857, in apply_chat_template out = self( File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 3055, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 3163, in _call_one return self.encode_plus( File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 3237, in encode_plus return self._encode_plus( File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 797, in _encode_plus first_ids = get_input_ids(text) File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 764, in get_input_ids tokens = self.tokenize(text, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 695, in tokenize tokenized_text.extend(self._tokenize(token)) File "/root/.cache/huggingface/modules/transformers_modules/LongCite-glm4-9b/tokenization_chatglm.py", line 109, in _tokenize tokens.append(self.decoder[t]) KeyError: [address=0.0.0.0:43101, pid=2375] 151331 Traceback (most recent call last): File "/root/miniconda3/lib/python3.8/site-packages/gradio/queueing.py", line 527, in process_events response = await route_utils.call_process_api( File "/root/miniconda3/lib/python3.8/site-packages/gradio/route_utils.py", line 261, in call_process_api output = await app.get_blocks().process_api( File "/root/miniconda3/lib/python3.8/site-packages/gradio/blocks.py", line 1786, in process_api result = await self.call_function( File "/root/miniconda3/lib/python3.8/site-packages/gradio/blocks.py", line 1350, in call_function prediction = await utils.async_iteration(iterator) File "/root/miniconda3/lib/python3.8/site-packages/gradio/utils.py", line 583, in async_iteration return await iterator.anext() File "/root/miniconda3/lib/python3.8/site-packages/gradio/utils.py", line 709, in asyncgen_wrapper response = await iterator.anext() File "/root/miniconda3/lib/python3.8/site-packages/gradio/chat_interface.py", line 545, in _stream_fn first_response = await async_iteration(generator) File "/root/miniconda3/lib/python3.8/site-packages/gradio/utils.py", line 583, in async_iteration return await iterator.anext() File "/root/miniconda3/lib/python3.8/site-packages/gradio/utils.py", line 576, in anext return await anyio.to_thread.run_sync( File "/root/miniconda3/lib/python3.8/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/root/miniconda3/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future File "/root/miniconda3/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, *args) File "/root/miniconda3/lib/python3.8/site-packages/gradio/utils.py", line 559, in run_sync_iterator_async return next(iterator) File "/root/miniconda3/lib/python3.8/site-packages/xinference/core/chat_interface.py", line 122, in generate_wrapper for chunk in model.chat( File "/root/miniconda3/lib/python3.8/site-packages/xinference/client/common.py", line 51, in streaming_response_iterator raise Exception(str(error)) Exception: [address=0.0.0.0:43101, pid=2375] 151331

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 1 month ago

This issue was closed because it has been inactive for 5 days since being marked as stale.