Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
I asksed "最优美的 Python 程序段落长啥样?", this error can be reproduced all the time.
To Reproduce
To help us to reproduce this bug, please provide information below:
Your Python version.
The version of xinference you use.
Versions of crucial packages.
Full stack of the error.
Minimized code to reproduce the error.
2023-12-14 18:38:02,185 xinference.api.restful_api 23851 ERROR Chat completion stream got an error: [address=127.0.0.1:62077, pid=23906] 'utf-8' codec can't decode byte 0xe7 in position 32: unexpected end of data
Traceback (most recent call last):
File "/Users/xuyeqin/Workspace/inference/xinference/api/restful_api.py", line 810, in stream_results
async for item in iterator:
File "/Users/xuyeqin/Workspace/inference/xinference/core/model.py", line 115, in __anext__
return await self._model_actor_ref.next(self._uid)
File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/xoscar/backends/pool.py", line 657, in send
result = await self._run_coro(message.message_id, coro)
File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/xoscar/backends/pool.py", line 368, in _run_coro
return await coro
File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/xoscar/api.py", line 306, in __on_receive__
return await super().__on_receive__(message) # type: ignore
File "xoscar/core.pyx", line 558, in __on_receive__
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
result = await result
File "/Users/xuyeqin/Workspace/inference/xinference/core/utils.py", line 33, in wrapped
ret = await func(*args, **kwargs)
File "/Users/xuyeqin/Workspace/inference/xinference/core/model.py", line 396, in next
r = await self._call_wrapper(_wrapper)
File "/Users/xuyeqin/Workspace/inference/xinference/core/model.py", line 214, in _call_wrapper
return await asyncio.to_thread(_wrapper)
File "/Users/xuyeqin/miniconda3/lib/python3.9/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
File "/Users/xuyeqin/miniconda3/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/xuyeqin/Workspace/inference/xinference/core/model.py", line 384, in _wrapper
return next(gen)
File "/Users/xuyeqin/Workspace/inference/xinference/model/llm/ggml/qwen.py", line 130, in _convert_raw_text_chunks_to_chat
for token in enumerate(tokens):
File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/qwen_cpp/__init__.py", line 97, in _stream_generate
output = self.tokenizer.decode(token_cache)
UnicodeDecodeError: [address=127.0.0.1:62077, pid=23906] 'utf-8' codec can't decode byte 0xe7 in position 32: unexpected end of data
Traceback (most recent call last):
File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/gradio/routes.py", line 442, in run_predict
output = await app.get_blocks().process_api(
File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/gradio/blocks.py", line 1392, in process_api
result = await self.call_function(
File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/gradio/blocks.py", line 1111, in call_function
prediction = await utils.async_iteration(iterator)
File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/gradio/utils.py", line 346, in async_iteration
return await iterator.__anext__()
File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/gradio/utils.py", line 339, in __anext__
return await anyio.to_thread.run_sync(
File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/gradio/utils.py", line 322, in run_sync_iterator_async
return next(iterator)
File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/gradio/utils.py", line 691, in gen_wrapper
yield from f(*args, **kwargs)
File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/gradio/chat_interface.py", line 428, in _stream_fn
for response in generator:
File "/Users/xuyeqin/Workspace/inference/xinference/core/chat_interface.py", line 111, in generate_wrapper
for chunk in model.chat(
File "/Users/xuyeqin/Workspace/inference/xinference/client/common.py", line 49, in streaming_response_iterator
raise Exception(str(error))
Exception: [address=127.0.0.1:62077, pid=23906] 'utf-8' codec can't decode byte 0xe7 in position 32: unexpected end of data
Describe the bug
I asksed "最优美的 Python 程序段落长啥样?", this error can be reproduced all the time.
To Reproduce
To help us to reproduce this bug, please provide information below: