opea-project / GenAIExamples

Generative AI Examples is a collection of GenAI examples such as ChatQnA, Copilot, which illustrate the pipeline capabilities of the Open Platform for Enterprise AI (OPEA) project.
https://opea.dev
Apache License 2.0
260 stars 180 forks source link

[Xeon][ChatQnA]ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 is out of work #660

Closed NeoZhangJianyu closed 1 month ago

NeoZhangJianyu commented 2 months ago

[code]

commit f78aa9ee2f9d03e64c1fc48f94842c14a65b8256 (HEAD -> main, origin/main, origin/HEAD)
Author: chen, suyue <suyue.chen@intel.com>
Date:   Fri Aug 23 22:10:10 2024 +0800

[info] I setup the ChatQnA example as the guide. When verify the service "ghcr.io/huggingface/text-embeddings-inference:cpu-1.5", it's fault, return error:

[ tei-embedding ] HTTP status is not 200. Received status was 403

I find the docker container log with error:

2024-08-26T08:38:58.947567Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-****-**-v1.5", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "309396be2e45", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-08-26T08:38:58.947853Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
2024-08-26T08:38:59.072660Z  INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
2024-08-26T08:38:59.072768Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
2024-08-26T08:38:59.072793Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
2024-08-26T08:38:59.072797Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
2024-08-26T08:38:59.072817Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
2024-08-26T08:38:59.072854Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
2024-08-26T08:39:00.648994Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:317: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-base-en-v1.5/resolve/main/model.onnx)
2024-08-26T08:39:00.649031Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:318: Downloading `onnx/model.onnx`
2024-08-26T08:39:00.649150Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 1.576355797s
2024-08-26T08:39:00.679666Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
2024-08-26T08:39:00.717879Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 240 tokenization workers
2024-08-26T08:39:01.812968Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
2024-08-26T08:39:03.033339Z  WARN text_embeddings_router: router/src/lib.rs:267: Backend does not support a batch size > 8
2024-08-26T08:39:03.033354Z  WARN text_embeddings_router: router/src/lib.rs:268: forcing `max_batch_requests=8`
2024-08-26T08:39:03.033474Z  WARN text_embeddings_router: router/src/lib.rs:319: Invalid hostname, defaulting to 0.0.0.0
2024-08-26T08:39:03.034753Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1778: Starting HTTP server: 0.0.0.0:80
2024-08-26T08:39:03.034757Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1779: Ready

After check, the model path is wrong: https://huggingface.co/BAAI/bge-base-en-v1.5/resolve/main/model.onnx.

Current correct path is: https://huggingface.co/BAAI/bge-base-en-v1.5/resolve/main/onnx/model.onnx

My question is how to avoid such issue comes from 3rd party component. Is it possible to fork them and maintained by OPEA separately?

lkk12014402 commented 2 months ago

hi, @NeoZhangJianyu, I can't reproduce your issue with the 3rd party (huggingface) TEI docker image. We should always pull the latest image

docker run -p 8090:80 --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 --model-id BAAI/bge-base-en-v1.5

image

NeoZhangJianyu commented 1 month ago

I will check again! Thank you!

yinghu5 commented 1 month ago

verified, thanks