huggingface / text-embeddings-inference

A blazing fast inference solution for text embeddings models
https://huggingface.co/docs/text-embeddings-inference/quick_tour
Apache License 2.0
2.58k stars 161 forks source link

Unsupported model IR version #355

Open netw0rkf10w opened 1 month ago

netw0rkf10w commented 1 month ago

Feature request

I tried running some recent models and obtained the following error:

cpu-1.5: Pulling from huggingface/text-embeddings-inference
Digest: sha256:0502794a4d86974839e701dadd6d06e693ec78a0f6e87f68c391e88c52154f3f
Status: Image is up to date for ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
2024-07-25T12:40:25.295596Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "/dat*/******_**_***M_v5", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "c165dfa0057d", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-07-25T12:40:25.306206Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
2024-07-25T12:40:25.306330Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 2 tokenization workers
2024-07-25T12:40:25.316131Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
Error: Could not create backend

Caused by:
    Could not start backend: Failed to create ONNX Runtime session: Load model from /data/stella_en_400M_v5/onnx/model.onnx failed:/home/runner/work/onnxruntime-build/onnxruntime-build/onnxruntime/onnxruntime/core/graph/model.cc:179 onnxruntime::Model::Model(onnx::ModelProto&&, const PathString&, const IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 10, max supported IR version: 9

It seems that the ONNX runtime is not the latest version.

Motivation

It would be great to update the ONNX runtime to the latest version so that the latest models could be used.

Your contribution

Sorry I'm not familiar with the tech stack, but I could help testing.

netw0rkf10w commented 1 month ago

I believe it would suffice to replace "2.0.0-rc.2" with "2.0.0-rc.4" for the ort package.