xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
4.86k stars 386 forks source link

BUG: Qwen 7/13b inference failed with utf-8 decode error. #766

Closed qinxuye closed 1 month ago

qinxuye commented 9 months ago

Describe the bug

I asksed "最优美的 Python 程序段落长啥样?", this error can be reproduced all the time.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version.
  2. The version of xinference you use.
  3. Versions of crucial packages.
  4. Full stack of the error.
  5. Minimized code to reproduce the error.
2023-12-14 18:38:02,185 xinference.api.restful_api 23851 ERROR    Chat completion stream got an error: [address=127.0.0.1:62077, pid=23906] 'utf-8' codec can't decode byte 0xe7 in position 32: unexpected end of data
Traceback (most recent call last):
  File "/Users/xuyeqin/Workspace/inference/xinference/api/restful_api.py", line 810, in stream_results
    async for item in iterator:
  File "/Users/xuyeqin/Workspace/inference/xinference/core/model.py", line 115, in __anext__
    return await self._model_actor_ref.next(self._uid)
  File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/xoscar/backends/pool.py", line 657, in send
    result = await self._run_coro(message.message_id, coro)
  File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/xoscar/backends/pool.py", line 368, in _run_coro
    return await coro
  File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/xoscar/api.py", line 306, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/Users/xuyeqin/Workspace/inference/xinference/core/utils.py", line 33, in wrapped
    ret = await func(*args, **kwargs)
  File "/Users/xuyeqin/Workspace/inference/xinference/core/model.py", line 396, in next
    r = await self._call_wrapper(_wrapper)
  File "/Users/xuyeqin/Workspace/inference/xinference/core/model.py", line 214, in _call_wrapper
    return await asyncio.to_thread(_wrapper)
  File "/Users/xuyeqin/miniconda3/lib/python3.9/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/Users/xuyeqin/miniconda3/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/xuyeqin/Workspace/inference/xinference/core/model.py", line 384, in _wrapper
    return next(gen)
  File "/Users/xuyeqin/Workspace/inference/xinference/model/llm/ggml/qwen.py", line 130, in _convert_raw_text_chunks_to_chat
    for token in enumerate(tokens):
  File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/qwen_cpp/__init__.py", line 97, in _stream_generate
    output = self.tokenizer.decode(token_cache)
UnicodeDecodeError: [address=127.0.0.1:62077, pid=23906] 'utf-8' codec can't decode byte 0xe7 in position 32: unexpected end of data
Traceback (most recent call last):
  File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/gradio/routes.py", line 442, in run_predict
    output = await app.get_blocks().process_api(
  File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/gradio/blocks.py", line 1392, in process_api
    result = await self.call_function(
  File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/gradio/blocks.py", line 1111, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/gradio/utils.py", line 346, in async_iteration
    return await iterator.__anext__()
  File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/gradio/utils.py", line 339, in __anext__
    return await anyio.to_thread.run_sync(
  File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/gradio/utils.py", line 322, in run_sync_iterator_async
    return next(iterator)
  File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/gradio/utils.py", line 691, in gen_wrapper
    yield from f(*args, **kwargs)
  File "/Users/xuyeqin/miniconda3/lib/python3.9/site-packages/gradio/chat_interface.py", line 428, in _stream_fn
    for response in generator:
  File "/Users/xuyeqin/Workspace/inference/xinference/core/chat_interface.py", line 111, in generate_wrapper
    for chunk in model.chat(
  File "/Users/xuyeqin/Workspace/inference/xinference/client/common.py", line 49, in streaming_response_iterator
    raise Exception(str(error))
Exception: [address=127.0.0.1:62077, pid=23906] 'utf-8' codec can't decode byte 0xe7 in position 32: unexpected end of data
github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 1 month ago

This issue was closed because it has been inactive for 5 days since being marked as stale.