opea-project / GenAIExamples

Generative AI Examples is a collection of GenAI examples such as ChatQnA, Copilot, which illustrate the pipeline capabilities of the Open Platform for Enterprise AI (OPEA) project.
Apache License 2.0
217 stars 134 forks source link

[ChatQnA] TEI 1.5 causes an error starting the tei-reranking-server #497

Open wsfowler opened 1 month ago

wsfowler commented 1 month ago

I was updating my Terraform/Ansible recipe this morning and I ran into this error while trying to start the example using the provided compose.yaml file.

Error: Could not create backend

Caused by:
    Could not start backend: Failed to create ONNX Runtime session: Deserialize tensor onnx::MatMul_3121 failed.GetFileLength for /data/models--BAAI--bge-reranker-large/snapshots/55611d7bca2a7133960a6d3b71e083071bbfc312/onnx/model.onnx_data failed:Invalid fd was supplied: -1

The reranking container then exits.

CONTAINER ID   IMAGE                                                   COMMAND                  CREATED          STATUS                      PORTS                                                                                  NAMES
ff5e425f0098   opea/chatqna-ui:latest                                  "docker-entrypoint.s…"   11 minutes ago   Up 11 minutes     >5173/tcp, :::5173->5173/tcp                                              chatqna-xeon-ui-server
db9e4993a00c   opea/chatqna:latest                                     "python chatqna.py"      11 minutes ago   Up 11 minutes     >8888/tcp, :::8888->8888/tcp                                              chatqna-xeon-backend-server
57845af811cd   opea/reranking-tei:latest                               "python reranking_te…"   11 minutes ago   Up 11 minutes     >8000/tcp, :::8000->8000/tcp                                              reranking-tei-xeon-server
82d83238b4fe   opea/llm-tgi:latest                                     "bash entrypoint.sh"     11 minutes ago   Up 11 minutes     >9000/tcp, :::9000->9000/tcp                                              llm-tgi-server
7347b2fb4a3b   opea/embedding-tei:latest                               "python embedding_te…"   11 minutes ago   Up 11 minutes     >6000/tcp, :::6000->6000/tcp                                              embedding-tei-server
97ef7cfeb315   opea/dataprep-redis:latest                              "python prepare_doc_…"   11 minutes ago   Up 11 minutes     >6007-6009/tcp, :::6007-6009->6007-6009/tcp                          dataprep-redis-server
12c84f7e9667   opea/retriever-redis:latest                             "/home/user/comps/re…"   11 minutes ago   Up 11 minutes     >7000/tcp, :::7000->7000/tcp                                              retriever-redis-server
c573346d18c1   redis/redis-stack:7.2.0-v9                              "/entrypoint.sh"         11 minutes ago   Up 11 minutes     >6379/tcp, :::6379->6379/tcp,>8001/tcp, :::8001->8001/tcp   redis-vector-db
c7d1c7f0763d   ghcr.io/huggingface/text-generation-inference:2.1.0     "/tgi-entrypoint.sh …"   11 minutes ago   Up 11 minutes     >80/tcp, :::9009->80/tcp                                                  tgi-service
3ac952569089   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5   "text-embeddings-rou…"   11 minutes ago   Exited (1) 10 minutes ago                                                                                          tei-reranking-server
47f803c58222   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5   "text-embeddings-rou…"   11 minutes ago   Up 11 minutes     >80/tcp, :::6006->80/tcp                                                  tei-embedding-server

I found the following issue on the TEI repo: https://github.com/huggingface/text-embeddings-inference/issues/341

If I switch the reranking service to use 1.4, then it starts correctly and the example works.

root@ip-172-31-29-103:/opt/GenAIExamples/ChatQnA/docker/xeon# docker ps -a
CONTAINER ID   IMAGE                                                   COMMAND                  CREATED         STATUS         PORTS                                                                                  NAMES
03154665e268   opea/chatqna-ui:latest                                  "docker-entrypoint.s…"   2 minutes ago   Up 2 minutes>5173/tcp, :::5173->5173/tcp                                              chatqna-xeon-ui-server
319d28eeb771   opea/chatqna:latest                                     "python chatqna.py"      2 minutes ago   Up 2 minutes>8888/tcp, :::8888->8888/tcp                                              chatqna-xeon-backend-server
3a0ca8da9838   opea/llm-tgi:latest                                     "bash entrypoint.sh"     2 minutes ago   Up 2 minutes>9000/tcp, :::9000->9000/tcp                                              llm-tgi-server
3a78c4595faa   opea/reranking-tei:latest                               "python reranking_te…"   2 minutes ago   Up 2 minutes>8000/tcp, :::8000->8000/tcp                                              reranking-tei-xeon-server
54dd50950c0d   opea/dataprep-redis:latest                              "python prepare_doc_…"   2 minutes ago   Up 2 minutes>6007-6009/tcp, :::6007-6009->6007-6009/tcp                          dataprep-redis-server
ec9fc6be5ddd   opea/retriever-redis:latest                             "/home/user/comps/re…"   2 minutes ago   Up 2 minutes>7000/tcp, :::7000->7000/tcp                                              retriever-redis-server
3432e3e8b6a4   opea/embedding-tei:latest                               "python embedding_te…"   2 minutes ago   Up 2 minutes>6000/tcp, :::6000->6000/tcp                                              embedding-tei-server
d154ee44dce7   ghcr.io/huggingface/text-generation-inference:2.1.0     "/tgi-entrypoint.sh …"   2 minutes ago   Up 2 minutes>80/tcp, :::9009->80/tcp                                                  tgi-service
632f404a5be9   redis/redis-stack:7.2.0-v9                              "/entrypoint.sh"         2 minutes ago   Up 2 minutes>6379/tcp, :::6379->6379/tcp,>8001/tcp, :::8001->8001/tcp   redis-vector-db
d11a7d3770e7   ghcr.io/huggingface/text-embeddings-inference:cpu-1.4   "text-embeddings-rou…"   2 minutes ago   Up 2 minutes>80/tcp, :::8808->80/tcp                                                  tei-reranking-server
ab513f0a9810   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5   "text-embeddings-rou…"   2 minutes ago   Up 2 minutes>80/tcp, :::6006->80/tcp                                                  tei-embedding-server
mkbhanda commented 1 month ago

@kevinintel and @lvliang-intel any chance we can revert to TGI 1.4 till the Hugging face bug is resolved. This example otherwise fails.

ashahba commented 1 month ago

Thanks @mkbhanda I'm testing with 1.5 today and I'm not seeing this issue:

$ docker compose -f compose.yaml logs reranking -f
reranking-tei-xeon-server  | [2024-07-31 23:49:43,238] [    INFO] - CORS is enabled.
reranking-tei-xeon-server  | [2024-07-31 23:49:43,239] [    INFO] - Setting up HTTP server
reranking-tei-xeon-server  | [2024-07-31 23:49:43,239] [    INFO] - Uvicorn server setup on port 8000
reranking-tei-xeon-server  | INFO:     Waiting for application startup.
reranking-tei-xeon-server  | INFO:     Application startup complete.
reranking-tei-xeon-server  | INFO:     Uvicorn running on (Press CTRL+C to quit)
reranking-tei-xeon-server  | [2024-07-31 23:49:43,255] [    INFO] - HTTP server setup successful


$ docker compose -f compose.yaml ps | grep reranking
reranking-tei-xeon-server     opea/reranking-tei:latest                               "python reranking_te…"   reranking                    10 minutes ago   Up 9 minutes         >8000/tcp
tei-reranking-server          ghcr.io/huggingface/text-embeddings-inference:cpu-1.5   "text-embeddings-rou…"   tei-reranking-service        10 minutes ago   Up 9 minutes         >80/tcp

Would you please try again?


ashahba commented 1 month ago

Is it possible that v0.7 containers are being mixed with latest code?

wsfowler commented 1 month ago

I'll test again, but the only mixing of code happening is that I'm using the latest compose.yaml file with the v0.7 containers and I don't see major differences for the reranking service, if I look at the history I don't see any major changes beyond the TEI version since v0.7.

I retested this morning and I'm getting the same error:

2024-08-01T13:00:54.638796Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-********-**rge", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "e6c5ab9b2197", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-08-01T13:00:54.638872Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-08-01T13:00:54.681575Z  INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
2024-08-01T13:00:54.863118Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
2024-08-01T13:00:54.878546Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
2024-08-01T13:00:54.878558Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
2024-08-01T13:00:54.960966Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
2024-08-01T13:00:55.108601Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
2024-08-01T13:00:55.126338Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:317: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-large/resolve/main/model.onnx)
2024-08-01T13:00:55.126364Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:318: Downloading `onnx/model.onnx`
2024-08-01T13:00:55.166679Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 288.133574ms
2024-08-01T13:00:55.783229Z  WARN text_embeddings_router: router/src/lib.rs:195: Could not find a Sentence Transformers config
2024-08-01T13:00:55.783257Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
2024-08-01T13:00:55.784582Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 32 tokenization workers
2024-08-01T13:01:02.836291Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
Error: Could not create backend

Caused by:
    Could not start backend: Failed to create ONNX Runtime session: Deserialize tensor roberta.encoder.layer.13.intermediate.dense.bias failed.GetFileLength for /data/models--BAAI--bge-reranker-large/snapshots/55611d7bca2a7133960a6d3b71e083071bbfc312/onnx/model.onnx_data failed:Invalid fd was supplied: -1
wsfowler commented 1 month ago

Just tested with the v0.7 release of the repo and it takes the version of TEI back to 1.2, so it seems like things are working. Let me see if I build the v0.8 containers if things work. Also, with the v0.7 release the conversational UI is deployed but not working.

wsfowler commented 1 month ago

Just tested with the v0.8 release of the repo, building the containers and I'm running into the original problem with the TEI 1.5 container for reranking.

root@ip-172-31-19-70:/opt/GenAIExamples/ChatQnA/docker/xeon# docker ps -a
CONTAINER ID   IMAGE                                                   COMMAND                  CREATED          STATUS                      PORTS                                                                                  NAMES
e233647b23ef   opea/chatqna-ui:latest                                  "docker-entrypoint.s…"   36 seconds ago   Up 34 seconds     >5173/tcp, :::5173->5173/tcp                                              chatqna-xeon-ui-server
e253767c1551   opea/chatqna:latest                                     "python chatqna.py"      36 seconds ago   Up 35 seconds     >8888/tcp, :::8888->8888/tcp                                              chatqna-xeon-backend-server
ef38c496621f   opea/embedding-tei:latest                               "python embedding_te…"   36 seconds ago   Up 35 seconds     >6000/tcp, :::6000->6000/tcp                                              embedding-tei-server
6b2488c0716c   opea/dataprep-redis:latest                              "python prepare_doc_…"   36 seconds ago   Up 35 seconds     >6007-6009/tcp, :::6007-6009->6007-6009/tcp                          dataprep-redis-server
d51236335642   opea/retriever-redis:latest                             "python retriever_re…"   36 seconds ago   Up 35 seconds     >7000/tcp, :::7000->7000/tcp                                              retriever-redis-server
335c1b57fde2   opea/llm-tgi:latest                                     "bash entrypoint.sh"     36 seconds ago   Up 35 seconds     >9000/tcp, :::9000->9000/tcp                                              llm-tgi-server
257f047b5fea   opea/reranking-tei:latest                               "python reranking_te…"   36 seconds ago   Up 35 seconds     >8000/tcp, :::8000->8000/tcp                                              reranking-tei-xeon-server
6ad87af9aab0   ghcr.io/huggingface/text-generation-inference:2.1.0     "/tgi-entrypoint.sh …"   36 seconds ago   Up 36 seconds     >80/tcp, :::9009->80/tcp                                                  tgi-service
2e22f0e71955   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5   "text-embeddings-rou…"   36 seconds ago   Up 36 seconds     >80/tcp, :::6006->80/tcp                                                  tei-embedding-server
5b16201f676b   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5   "text-embeddings-rou…"   36 seconds ago   Exited (1) 27 seconds ago                                                                                          tei-reranking-server
ad47210bbf87   redis/redis-stack:7.2.0-v9                              "/entrypoint.sh"         36 seconds ago   Up 36 seconds     >6379/tcp, :::6379->6379/tcp,>8001/tcp, :::8001->8001/tcp   redis-vector-db

GenAIExamples Repo:

root@ip-172-31-19-70:/opt/GenAIExamples/ChatQnA/docker/xeon# git status
HEAD detached at a2437e8

GenAIComps Repo

root@ip-172-31-19-70:/opt/GenAIComps# git status
HEAD detached at f37ed79
chensuyue commented 1 month ago

I did some local test with v0.8 and didn't found any issue. Did you build local image with --no-cache? Or you can do docker system prune --all to clean up the caches before build the image.

mkbhanda commented 1 month ago

@wsfowler will consider building a versioned Terraform script for ChatQnA. Indeed there was some version mixup -- the github clone of the V0.8 and beyond) latest and the docker image repository containing V 0.7. Given we are not releasing any fixes till v0.9, that seems the more practical approach so we have working example sooner than later. Note that @wsfowler spotted an issue filed on HF site for TEI 1.5

wsfowler commented 1 month ago

I'll run some more tests using v0.8 with the commands you mention.

@mkbhanda One issue I will run into creating the versioned Terraform/Ansible script is that the docker compose files today specify latest for the OPEA containers. For instance if I switch the repo to the v0.7 release, after the v0.8 containers are published, then start using the OPEA docker compose file then it will attempt to pull down the latest containers. So some thought may need to be given to using specific versions for the OPEA containers in the docker compose files and others.

wsfowler commented 1 month ago

@chensuyue Just did a clean build of the v0.8 containers again and I'm still seeing the same error. I did the docker system prune --all and I made sure to build the containers using the --no-cache command. I used the build commands straight from the readme.

mkbhanda commented 1 month ago

I'll run some more tests using v0.8 with the commands you mention.

@mkbhanda One issue I will run into creating the versioned Terraform/Ansible script is that the docker compose files today specify latest for the OPEA containers. For instance if I switch the repo to the v0.7 release, after the v0.8 containers are published, then start using the OPEA docker compose file then it will attempt to pull down the latest containers. So some thought may need to be given to using specific versions for the OPEA containers in the docker compose files and others.

@kevinintel and @chensuyue - our use of latest tagged images in docker-compose and manifests is nice, but it causes issues when we have bugs in what is released, as in the case of ChatQnA thanks to a Hugging Face image issue. What if we could send in an argument to docker-compose that lets us pull images with a particular tag? Does such a feature exist?

Alternately, could we with each release expect images and their use to be in sync, like OPEA V0.9 uses V0.9 images of the micro components explicitly (not latest)? Then we could at least use a tagged version of all files, and when we git clone we can pull using a tag.

chensuyue commented 1 month ago

I'll run some more tests using v0.8 with the commands you mention. @mkbhanda One issue I will run into creating the versioned Terraform/Ansible script is that the docker compose files today specify latest for the OPEA containers. For instance if I switch the repo to the v0.7 release, after the v0.8 containers are published, then start using the OPEA docker compose file then it will attempt to pull down the latest containers. So some thought may need to be given to using specific versions for the OPEA containers in the docker compose files and others.

@kevinintel and @chensuyue - our use of latest tagged images in docker-compose and manifests is nice, but it causes issues when we have bugs in what is released, as in the case of ChatQnA thanks to a Hugging Face image issue. What if we could send in an argument to docker-compose that lets us pull images with a particular tag? Does such a feature exist?

Alternately, could we with each release expect images and their use to be in sync, like OPEA V0.9 uses V0.9 images of the micro components explicitly (not latest)? Then we could at least use a tagged version of all files, and when we git clone we can pull using a tag.

Yes, we have proposed such solution to freeze all the deps version and image tag in the release branch form v0.9.