turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.28k stars 243 forks source link

DeepSeek: ValueError: bytes must be in range(0, 256) #219

Closed SinanAkkoyun closed 6 months ago

SinanAkkoyun commented 7 months ago
Traceback (most recent call last):
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/fastapi/applications.py", line 1106, in __call__
    await super().__call__(scope, receive, send)
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/starlette/routing.py", line 69, in app
    await response(scope, receive, send)
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/starlette/responses.py", line 270, in __call__
    async with anyio.create_task_group() as task_group:
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 597, in __aexit__
    raise exceptions[0]
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/starlette/responses.py", line 273, in wrap
    await func()
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/starlette/responses.py", line 262, in stream_response
    async for chunk in self.body_iterator:
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/starlette/concurrency.py", line 63, in iterate_in_threadpool
    yield await anyio.to_thread.run_sync(_next, iterator)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/starlette/concurrency.py", line 53, in _next
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/home/ai/ml/llm/inference/exl2/exllama-v2/examples/copilot.py", line 223, in stream
    chunk, eos, _ = generator.stream()
                    ^^^^^^^^^^^^^^^^^^
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/exllamav2/generator/streaming.py", line 155, in stream
    next_token, new_text = self._catch_utf8(next_token, new_text)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/exllamav2/generator/streaming.py", line 242, in _catch_utf8
    if self.expect_utf8 == 0: return self._decode_utf8()
                                     ^^^^^^^^^^^^^^^^^^^
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/exllamav2/generator/streaming.py", line 203, in _decode_utf8
    c = bytes(b).decode('utf-8')
        ^^^^^^^^
ValueError: bytes must be in range(0, 256)
turboderp commented 7 months ago

I find this a little strange. Does this still happen on the latest version? The relevant code is:

        try:
            id_to_ord = self.tokenizer.get_id_to_ord_list()
            b = [id_to_ord[x] for x in self.held_utf8_tokens[0].tolist()]
            c = bytes(b).decode('utf-8')
        except ValueError:
            id_to_piece = self.tokenizer.get_id_to_piece_list()
            c = "".join(id_to_piece[x] for x in self.held_utf8_tokens[0].tolist())
        except UnicodeDecodeError:
            c = "�"

So it shouldn't be able to throw a ValueError here.

turboderp commented 6 months ago

I'll assume this was a version mismatch. Feel free to reopen if not.