[BUG] embedding models: TypeError: SentenceTransformer.__init__() got an unexpected keyword argument 'model_kwargs'

OS

Linux

GPU Library

CUDA 12.x

Python version

3.11

Describe the bug

Out of the models I tested, most exit with similar errors. Here a some embedding models tested, I only got mixedbread-ai_mxbai-rerank-xsmall-v1 to load, the others exit with the log error show below.

BAAI_bge-small-en-v1.5               mixedbread-ai_mxbai-embed-large-v1    place_your_models_here.txt
jinaai_jina-embeddings-v2-base-code  mixedbread-ai_mxbai-rerank-xsmall-v1  sentence-transformers_all-MiniLM-L6-v2
nomic-ai_nomic-embed-text-v1.5        Snowflake_snowflake-arctic-embed-m-v1.5

Reproduction steps

install tabbyapi with miniconda python3.11, use latest version. install the extras module with pip, within conda

run start.sh with default config.yml these settings:

embedding_model_name: mixedbread-ai_mxbai-embed-large-v1 embeddings_device: cpu # also tried gpu

Expected behavior

trying the call the embeddings endpoint with another application:

llamaindex-cli rag -q "What is the main topic of this document?" -f 'doc.md'

Logs

INFO: Model successfully loaded. INFO 2024-08-18 22:16:52,968 infinity_emb INFO: model=models/jinaai_jina-embeddings-v2-base-code selected, select_model.py:62 using engine=torch and device=cpu
Traceback (most recent call last): File "/home/user/projects/tabbyAPI/start.py", line 254, in entrypoint(converted_args) File "/home/user/projects/tabbyAPI/main.py", line 178, in entrypoint asyncio.run(entrypoint_async()) File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/asyncio/runners.py", line 190, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/home/user/projects/tabbyAPI/main.py", line 99, in entrypoint_async await model.load_embedding_model(embedding_model_path, embedding_config) File "/home/user/projects/tabbyAPI/common/model.py", line 142, in load_embedding_model await embeddings_container.load(kwargs) File "/home/user/projects/tabbyAPI/backends/infinity/model.py", line 48, in load self.engine = AsyncEmbeddingEngine.from_args(engine_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/infinity_emb/engine.py", line 67, in from_args engine = cls(**engine_args.to_dict(), _show_deprecation_warning=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/infinity_emb/engine.py", line 53, in init self._model, self._min_inference_t, self._max_inference_t = select_model( ^^^^^^^^^^^^^ File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/infinity_emb/inference/select_model.py", line 70, in select_model loaded_engine = unloaded_engine.value(engine_args=engine_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/infinity_emb/transformer/embedder/sentence_transformer.py", line 58, in init super().init( TypeError: SentenceTransformer.init() got an unexpected keyword argument 'model_kwargs'

Additional context

here is a successful loading of mixedbread-ai_mxbai-rerank-xsmall-v1. I may be using the wrong embeddings model.

./start.sh --log-prompt True --port 5000 --host 0.0.0.0
It looks like you're in a conda environment. Skipping venv check.
pip 24.0 from /home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/pip (python 3.11)
Loaded your saved preferences from `start_options.json`
Successfully wrote your start script options to `start_options.json`. 
If something goes wrong, editing or deleting the file will reinstall TabbyAPI as a first-time user.
Starting TabbyAPI...
INFO:     ExllamaV2 version: 0.1.8
INFO:     Your API key is: 0e7e212a378e541accf697a8f5f4770b
INFO:     Your admin key is: 61588373d57f63a11dbf24b72766a26d
INFO:     
INFO:     If these keys get compromised, make sure to delete api_tokens.yml and restart the server. Have fun!
INFO:     Generation logging is enabled for: prompts
WARNING:  The given cache_size (32768) is less than 2 * max_seq_len and may be too small for requests using CFG. 
WARNING:  Ignore this warning if you do not plan on using CFG.
INFO:     Attempting to load a prompt template if present.
INFO:     Using template "chatml" for chat completions.
INFO:     Loading model: /home/user/Storage/bigstorm_Codestral-22B-v0.1-8.0bpw-8hb-exl2
INFO:     Loading with autosplit
Loading model modules ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 115/115 0:00:00
INFO:     Model successfully loaded.
INFO     2024-08-18 22:38:17,346 infinity_emb INFO: model=`models/mixedbread-ai_mxbai-rerank-xsmall-v1` selected,   select_model.py:62
         using engine=`torch` and device=`cpu`                                                                                        
INFO     2024-08-18 22:38:17,874 infinity_emb INFO: creating batching engine                                      batch_handler.py:333
INFO     2024-08-18 22:38:17,875 infinity_emb INFO: ready to batch requests.                                      batch_handler.py:399
INFO:     Embedding model successfully loaded.
INFO:     Developer documentation: http://0.0.0.0:5000/redoc
INFO:     Starting OAI API
INFO:     Completions: http://0.0.0.0:5000/v1/completions
INFO:     Chat completions: http://0.0.0.0:5000/v1/chat/completions
INFO:     Started server process [1062929]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
INFO:     Recieved embeddings request 2a228483d59f4f85890f42b897424aa2
INFO:     192.168.2.19:46686 - "POST /v1/embeddings HTTP/1.1" 500
ERROR:    Exception in ASGI application
ERROR:    Traceback (most recent call last):
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", 
line 411, in run_asgi
ERROR:        result = await app(  # type: ignore[func-returns-value]
ERROR:                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 
69, in __call__
ERROR:        return await self.app(scope, receive, send)
ERROR:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in 
__call__
ERROR:        await super().__call__(scope, receive, send)
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/starlette/applications.py", line 123, in 
__call__
ERROR:        await self.middleware_stack(scope, receive, send)
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in
__call__
ERROR:        raise exc
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in
__call__
ERROR:        await self.app(scope, receive, _send)
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in 
__call__
ERROR:        await self.app(scope, receive, send)
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65,
in __call__
ERROR:        await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in
wrapped_app
ERROR:        raise exc
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in
wrapped_app
ERROR:        await app(scope, receive, sender)
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
ERROR:        await self.middleware_stack(scope, receive, send)
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
ERROR:        await route.handle(scope, receive, send)
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
ERROR:        await self.app(scope, receive, send)
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
ERROR:        await wrap_app_handling_exceptions(app, request)(scope, receive, send)
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in
wrapped_app
ERROR:        raise exc
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in
wrapped_app
ERROR:        await app(scope, receive, sender)
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
ERROR:        response = await func(request)
ERROR:                   ^^^^^^^^^^^^^^^^^^^
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
ERROR:        raw_response = await run_endpoint_function(
ERROR:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/fastapi/routing.py", line 191, in 
run_endpoint_function
ERROR:        return await dependant.call(**values)
ERROR:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR:      File "/home/user/projects/tabbyAPI/endpoints/OAI/router.py", line 148, in embeddings
ERROR:        response = await run_with_request_disconnect(
ERROR:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR:      File "/home/user/projects/tabbyAPI/common/networking.py", line 88, in run_with_request_disconnect
ERROR:        return call_task.result()
ERROR:               ^^^^^^^^^^^^^^^^^^
ERROR:      File "/home/user/projects/tabbyAPI/endpoints/OAI/utils/embeddings.py", line 42, in get_embeddings
ERROR:        embedding_data = await model.embeddings_container.generate(data.input)
ERROR:                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR:      File "/home/user/projects/tabbyAPI/backends/infinity/model.py", line 64, in generate
ERROR:        result_embeddings, usage = await self.engine.embed(sentence_input)
ERROR:                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/infinity_emb/engine.py", line 150, in embed
ERROR:        embeddings, usage = await self._batch_handler.embed(sentences=sentences)
ERROR:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR:      File "/home/user/miniconda3/envs/exllamav2_3.11/lib/python3.11/site-packages/infinity_emb/inference/batch_handler.py", 
line 138, in embed
ERROR:        raise ModelNotDeployedError(
ERROR:    infinity_emb.primitives.ModelNotDeployedError: the loaded moded cannot fullyfill `embed`.options are {'rerank'}.
INFO:     Recieved embeddings request 10afed55743049b690c52eb1198acd38

Acknowledgements

[X] I have looked for similar issues before submitting this one.
[X] I have read the disclaimer, and this issue is related to a code bug. If I have a question, I will use the Discord server.
[X] I understand that the developers have lives and my issue will be answered when possible.
[X] I understand the developers of this program are human, and I will ask my questions politely.

theroyallab / tabbyAPI