Support for Huggingface Fast Tokenizers

bibekyess commented 10 months ago

Hello ! I found some Llama2 models using the FastTokenizer provided by the Hugging Face tokenizers library, not the SentencePiece package used by regular Llama models. For instance, beomi/llama-2-ko-7b. It seems the current ExLlamaV2Tokenizer only support SentencePiece tokenizer, which requires tokenizer.model, can you please add support for Hugging Face tokenizers as well? I tried changing the file tokenizer.py to accomplish it and it worked well in exllama but in exllamav2, it works well sometimes but sometimes it gives following errors.

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/fastapi/applications.py", line 1106, in __call__
    await super().__call__(scope, receive, send)
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/starlette/middleware/cors.py", line 83, in __call__
    await self.app(scope, receive, send)
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/fastapi/routing.py", line 274, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/exllama_v2_sandbox/tabbyAPI/main.py", line 188, in generate_completion
    response_text = model_container.generate(data.prompt, **data.to_gen_params())
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/exllama_v2_sandbox/tabbyAPI/model.py", line 230, in generate
    reponse = "".join(gen)
              ^^^^^^^^^^^^
  File "/home/bibekyess/exllama_v2_sandbox/tabbyAPI/model.py", line 363, in generate_gen
    chunk, eos, tokens = self.generator.stream()
                         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/exllama_v2_sandbox/exllamav2/exllamav2/generator/streaming.py", line 155, in stream
    next_token, new_text = self._catch_utf8(next_token, new_text)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/exllama_v2_sandbox/exllamav2/exllamav2/generator/streaming.py", line 218, in _catch_utf8
    id_to_ord = self.tokenizer.get_id_to_ord_list()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/exllama_v2_sandbox/exllamav2/exllamav2/tokenizer.py", line 275, in get_id_to_ord_list
    match = self.ord_exp.match(p)
            ^^^^^^^^^^^^^^^^^^^^^
TypeError: expected string or bytes-like object, got 'int'
Response: 247 tokens generated in 7.81 seconds (31.63 T/s, context 9 tokens)
INFO:     127.0.0.1:47692 - "POST /v1/completions HTTP/1.1" 200 OK
INFO:     127.0.0.1:34096 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/fastapi/applications.py", line 1106, in __call__
    await super().__call__(scope, receive, send)
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/starlette/middleware/cors.py", line 83, in __call__
    await self.app(scope, receive, send)
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/fastapi/routing.py", line 274, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/anaconda3/envs/exllamav2-env/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/exllama_v2_sandbox/tabbyAPI/main.py", line 188, in generate_completion
    response_text = model_container.generate(data.prompt, **data.to_gen_params())
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/exllama_v2_sandbox/tabbyAPI/model.py", line 230, in generate
    reponse = "".join(gen)
              ^^^^^^^^^^^^
  File "/home/bibekyess/exllama_v2_sandbox/tabbyAPI/model.py", line 363, in generate_gen
    chunk, eos, tokens = self.generator.stream()
                         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/exllama_v2_sandbox/exllamav2/exllamav2/generator/streaming.py", line 155, in stream
    next_token, new_text = self._catch_utf8(next_token, new_text)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bibekyess/exllama_v2_sandbox/exllamav2/exllamav2/generator/streaming.py", line 220, in _catch_utf8
    b = id_to_ord[t]
        ~~~~~~~~~^^^
IndexError: list index out of range

It is interesting that sometimes, the inference is successful, while sometimes it is not. So, I want to request for official support of the Huggingface Fast Tokenizers.

Thank you! :)

bibekyess commented 10 months ago

Maybe the issue arises because of TabbyAPI. I created my own FastAPI server and I am not facing such issues.