huggingface / text-embeddings-inference

A blazing fast inference solution for text embeddings models
https://huggingface.co/docs/text-embeddings-inference/quick_tour
Apache License 2.0
2.77k stars 174 forks source link

TEI fails for Finetuned JinaAI Embeddings models #384

Closed StefanRaab closed 1 week ago

StefanRaab commented 2 months ago

System Info

TEI Inference Docker 1.4 , Cuda 12.2 , Nvidia T4

sudo docker run --gpus all -p 8080:80 -v ./volume2:/data --restart always -d ghcr.io/huggingface/text-embeddings-inference:turing-1.4 --model-id aari1995/German_Semantic_V3 --pooling mean --dtype float16 --max-client-batch-size 256 --max-batch-tokens 16384

Information

Tasks

Reproduction

Start a docker container with the following model https://huggingface.co/aari1995/German_Semantic_V3 I also tried to experiment with the architectures and trust remote code on sentence bert but it keeps routing to the bert model

{"timestamp":"2024-08-14T08:04:17.082662Z","level":"INFO","message":"Args { model_id: "/rep****ory", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "r-stefanraab-german-semantic-v3-znq-ffyjb6zd-101c8-dneb1", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/repository/cache"), payload_limit: 2000000, api_key: None, json_output: true, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }","target":"text_embeddings_router","filename":"router/src/main.rs","line_number":175} {"timestamp":"2024-08-14T08:04:17.095519Z","level":"INFO","message":"Maximum number of tokens per request: 8192","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":199} {"timestamp":"2024-08-14T08:04:17.095687Z","level":"INFO","message":"Starting 2 tokenization workers","target":"text_embeddings_core::tokenization","filename":"core/src/tokenization.rs","line_number":26} {"timestamp":"2024-08-14T08:04:17.109235Z","level":"INFO","message":"Starting model backend","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":250} {"timestamp":"2024-08-14T08:04:17.296077Z","level":"INFO","message":"Starting Bert model on Cuda(CudaDevice(DeviceId(1)))","target":"text_embeddings_backend_candle","filename":"backends/candle/src/lib.rs","line_number":268} Error: Could not create backend Caused by: Could not start backend: Bert only supports absolute position embeddings

Expected behavior

I would expect that, like the Base Jina Model, it would be routed to the Jinabert model, which would allow alibi as a type. Instead, it gets routed to a classical Bert Model.

kozistr commented 2 months ago

@StefanRaab TEI identifies the backend type using _name_or_path in config.json to differentiate between bert or jinabert (here the comment).

according to the source code, changing your model's _name_or_path to jinaai/jina-bert-implementation should work for now I guess.

OlivierDehaene commented 1 week ago

Yes, this logic is brittle however there is nothing else that can be done as the model creators decide how they want to name their model and Jina decided to name theirs BERT even though it has major architecture changes. For now, the only fix is to manually add this entry to your config.json.