Closed StefanRaab closed 1 week ago
@StefanRaab TEI identifies the backend type using _name_or_path
in config.json
to differentiate between bert or jinabert (here the comment).
according to the source code, changing your model's _name_or_path
to jinaai/jina-bert-implementation
should work for now I guess.
Yes, this logic is brittle however there is nothing else that can be done as the model creators decide how they want to name their model and Jina decided to name theirs BERT even though it has major architecture changes.
For now, the only fix is to manually add this entry to your config.json
.
System Info
TEI Inference Docker 1.4 , Cuda 12.2 , Nvidia T4
sudo docker run --gpus all -p 8080:80 -v ./volume2:/data --restart always -d ghcr.io/huggingface/text-embeddings-inference:turing-1.4 --model-id aari1995/German_Semantic_V3 --pooling mean --dtype float16 --max-client-batch-size 256 --max-batch-tokens 16384
Information
Tasks
Reproduction
Start a docker container with the following model https://huggingface.co/aari1995/German_Semantic_V3 I also tried to experiment with the architectures and trust remote code on sentence bert but it keeps routing to the bert model
{"timestamp":"2024-08-14T08:04:17.082662Z","level":"INFO","message":"Args { model_id: "/rep****ory", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "r-stefanraab-german-semantic-v3-znq-ffyjb6zd-101c8-dneb1", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/repository/cache"), payload_limit: 2000000, api_key: None, json_output: true, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }","target":"text_embeddings_router","filename":"router/src/main.rs","line_number":175} {"timestamp":"2024-08-14T08:04:17.095519Z","level":"INFO","message":"Maximum number of tokens per request: 8192","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":199} {"timestamp":"2024-08-14T08:04:17.095687Z","level":"INFO","message":"Starting 2 tokenization workers","target":"text_embeddings_core::tokenization","filename":"core/src/tokenization.rs","line_number":26} {"timestamp":"2024-08-14T08:04:17.109235Z","level":"INFO","message":"Starting model backend","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":250} {"timestamp":"2024-08-14T08:04:17.296077Z","level":"INFO","message":"Starting Bert model on Cuda(CudaDevice(DeviceId(1)))","target":"text_embeddings_backend_candle","filename":"backends/candle/src/lib.rs","line_number":268} Error: Could not create backend Caused by: Could not start backend: Bert only supports absolute position embeddings
Expected behavior
I would expect that, like the Base Jina Model, it would be routed to the Jinabert model, which would allow alibi as a type. Instead, it gets routed to a classical Bert Model.