theroyallab / tabbyAPI

An OAI compatible exllamav2 API that's both lightweight and fast
GNU Affero General Public License v3.0
565 stars 75 forks source link

[REQUEST] Nested model_name key #231

Open SinanAkkoyun opened 2 days ago

SinanAkkoyun commented 2 days ago

[ Linux | CUDA 12.2 | py 3.11 ]

Describe the bug

Hi, thanks for your awesome work!

When infering Llama3.1/70B-Instruct/6.0bpw over the OAI API, it throws:

INFO:     127.0.0.1:50540 - "POST /v1/chat/completions HTTP/1.1" 500 
ERROR:    Exception in ASGI application
ERROR:    Traceback (most recent call last):
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in 
run_asgi
ERROR:        result = await app(  # type: ignore[func-returns-value]
ERROR:                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
ERROR:        return await self.app(scope, receive, send)
ERROR:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
ERROR:        await super().__call__(scope, receive, send)
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/starlette/applications.py", line 113, in __call__
ERROR:        await self.middleware_stack(scope, receive, send)
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
ERROR:        raise exc
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
ERROR:        await self.app(scope, receive, _send)
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
ERROR:        await self.app(scope, receive, send)
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
ERROR:        await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)                                                     
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
ERROR:        raise exc
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
ERROR:        await app(scope, receive, sender)
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/starlette/routing.py", line 715, in __call__
ERROR:        await self.middleware_stack(scope, receive, send)
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
ERROR:        await route.handle(scope, receive, send)
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
ERROR:        await self.app(scope, receive, send)
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
ERROR:        await wrap_app_handling_exceptions(app, request)(scope, receive, send)                                                       
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
ERROR:        raise exc
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
ERROR:        await app(scope, receive, sender)
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
ERROR:        response = await f(request)
ERROR:                   ^^^^^^^^^^^^^^^^
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/fastapi/routing.py", line 301, in app
ERROR:        raw_response = await run_endpoint_function(
ERROR:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR:      File "/home/ai/.mconda3/envs/exllamav2/lib/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
ERROR:        return await dependant.call(**values)
ERROR:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR:      File "/home/ai/ml/lm/llm/inference/exl2/tabbyAPI/endpoints/OAI/router.py", line 112, in chat_completion_request
ERROR:        await load_inline_model(data.model, request)
ERROR:      File "/home/ai/ml/lm/llm/inference/exl2/tabbyAPI/endpoints/OAI/utils/completion.py", line 152, in load_inline_model
ERROR:        await model.load_model(
ERROR:      File "/home/ai/ml/lm/llm/inference/exl2/tabbyAPI/common/model.py", line 101, in load_model
ERROR:        async for _ in load_model_gen(model_path, **kwargs):
ERROR:      File "/home/ai/ml/lm/llm/inference/exl2/tabbyAPI/common/model.py", line 60, in load_model_gen
ERROR:        raise ValueError(
ERROR:    ValueError: Model "6.0bpw" is already loaded! Aborting.

However, when infering the model string 6.0bpw it works.

Reproduction steps

Restructure your models directory to have nested structure:

Llama3.1

Expected behavior

It should only listen to the full model string

Acknowledgements

bdashore3 commented 2 days ago

This is a request, so I'm removing the bug tag. Nested directories are currently not supported due to complexity and I'm on the fence of supporting this in the first place. A model name is a name, not a path.

Currently, if you want to have different bpw models, append -6bpw to the name of the folder rather than sub-nesting.