Internal Server Error in Embedding service and Megaservice in XPU example

ritesh-intel commented 2 weeks ago

I'm creating POC for testing microservices architecture in GPU system provided by opea. Build Mega Service of ChatQnA on Xeon

While running the command docker compose -f docker_compose.yaml up -d it's giving the following output

devcloud@a4bf01930946:/work_ritesh/GenAIExamples/ChatQnA/docker/xeon$ docker-compose -f docker_compose.yaml up -d
WARNING: The no_proxy variable is not set. Defaulting to a blank string.
WARNING: The LANGCHAIN_API_KEY variable is not set. Defaulting to a blank string.
WARNING: The LANGCHAIN_TRACING_V2 variable is not set. Defaulting to a blank string.
Pulling dataprep-redis-service (opea/dataprep-redis:latest)...
ERROR: The image for the service you're trying to recreate has been removed. If you continue, volume data could be lost. Consider backing up your data before continuing.

Continue with the new image? [yN]y
Pulling dataprep-redis-service (opea/dataprep-redis:latest)...
ERROR: pull access denied for opea/dataprep-redis, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

Where should I do the docker login, is opea/dataprep-redis available in docker hub or some other private repository?

Below are details of my CPU and GPU, also recommend me how can I use this intel GPU system as in documentation I can only find Gaudi as GPU and Xeon as CPU. I am utilizing ipex_llm for XPU

test@test:/nan/GenAIExamples/ChatQnA/docker/xeon$ xpu-smi discovery
+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information                                                                   |
+-----------+--------------------------------------------------------------------------------------+
| 0         | Device Name: Intel(R) Data Center GPU Flex 170                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | UUID: 00000000-0000-0000-26cf-**************                                           |
+-----------+--------------------------------------------------------------------------------------+
test@test:/nan/GenAIExamples/ChatQnA/docker/xeon$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         52 bits physical, 57 bits virtual
  Byte Order:            Little Endian
CPU(s):                  224
  On-line CPU(s) list:   0-223
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) Platinum 8480+
    CPU family:          6
    Model:               143
    Thread(s) per core:  2
    Core(s) per socket:  56
    Socket(s):           2
    Stepping:            8
    CPU max MHz:         3800.0000
    CPU min MHz:         800.0000
    BogoMIPS:            4000.00

ritesh-intel commented 1 week ago

It's working now but one more issue is coming that embedding service and Mega service is returning 'Internal Server Error'

devcloud@_:/work_ritesh$ curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "messages": "What is the revenue of Nike in 2023?" }' Internal Server Error devcloud@_:/work_ritesh$ curl http://${host_ip}:6000/v1/embeddings -X POST -d '{"text":"hello"}' -H 'Content-Type: application/json' Internal Server Error

letonghan commented 1 week ago

Hi @ritesh-intel , Firstly, you should manually build docker image opea/dataprep-redis on your server following the instruction here: https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker/xeon#5-build-dataprep-image

About the error Internal Server Error, it could be caused by many reasons. Can you check the service log using docker logs ${container_name} ? So we can help you to debug further.

ritesh-intel commented 1 week ago

Hi @letonghan

Thanks for replying

Below are my logs after I call the embed service docker_logs_while_calling_tei_embeddings.txt

I made my workaround with opea/dataprep-redis and after rebuilding it worked, Thanks!!

Here are my environment variables which I used for configuration export host_ip=0.0.0.0 export LANGCHAIN_TRACING_V2=true export LANGCHAIN_API_KEY=ls*** export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5" export RERANK_MODEL_ID="BAAI/bge-reranker-base" export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3" export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006" export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808" export TGI_LLM_ENDPOINT="http://${host_ip}:9009" export REDIS_URL="redis://${host_ip}:6379" export INDEX_NAME="rag-redis" export HUGGINGFACEHUB_APITOKEN=hf**** export MEGA_SERVICE_HOST_IP=${host_ip} export EMBEDDING_SERVICE_HOST_IP=${host_ip} export RETRIEVER_SERVICE_HOST_IP=${host_ip} export RERANK_SERVICE_HOST_IP=${host_ip} export LLM_SERVICE_HOST_IP=${host_ip} export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna" export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"

when I was setting up http_proxy and https_proxy, it was giving error that, not able to connect to proxy-dmz.intel.com so I removed it. export http_proxy=http://proxy-dmz.intel.com:911 export https_proxy=http://proxy-dmz.intel.com:912

letonghan commented 1 week ago

Ok. requests.exceptions.ConnectionError: (MaxRetryError("HTTPConnectionPool(host='0.0.0.0', port=6006): Max retries exceeded means that your tei-embedding service haven't been successfully started. Can you check the container log of it?

ritesh-intel commented 1 week ago

038ba787cb21 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 40 minutes ago Up 40 minutes 0.0.0.0:6006->80/tcp, :::6006->80/tcp tei-embedding-server

Sure, here are the logs of the embed service

devcloud@a4bf01930946:/work_ritesh/GenAIExamples/ChatQnA/docker/xeon$ docker logs 038ba787cb21
2024-07-04T09:22:34.198952Z  INFO text_embeddings_router: router/src/main.rs:140: Args { model_id: "BAA*/***-****-**-v1.5", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, hf_api_token: None, hostname: "038ba787cb21", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, cors_allow_origin: None }
2024-07-04T09:22:34.199063Z  INFO hf_hub: /usr/local/cargo/git/checkouts/hf-hub-1aadb4c6e2cbe1ba/b167f69/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-07-04T09:22:34.243738Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2024-07-04T09:22:34.243849Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:37: Model artifacts downloaded in 112.708µs
2024-07-04T09:22:34.257024Z  INFO text_embeddings_router: router/src/lib.rs:169: Maximum number of tokens per request: 512
2024-07-04T09:22:34.281426Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:23: Starting 112 tokenization workers
2024-07-04T09:22:34.665414Z  INFO text_embeddings_router: router/src/lib.rs:194: Starting model backend
2024-07-04T09:22:34.665741Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:124: Starting Bert model on Cpu
2024-07-04T09:22:35.005224Z  WARN text_embeddings_router: router/src/lib.rs:211: Backend does not support a batch size > 4
2024-07-04T09:22:35.005242Z  WARN text_embeddings_router: router/src/lib.rs:212: forcing `max_batch_requests=4`
2024-07-04T09:22:35.005471Z  WARN text_embeddings_router: router/src/lib.rs:263: Invalid hostname, defaulting to 0.0.0.0
2024-07-04T09:22:35.006779Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1555: Starting HTTP server: 0.0.0.0:80
2024-07-04T09:22:35.006786Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1556: Ready
2024-07-04T09:22:47.549273Z  INFO embed{total_time="32.073214ms" tokenization_time="5.298659ms" queue_time="303.183µs" inference_time="26.384068ms"}: text_embeddings_router::http::server: router/src/http/server.rs:590: Success
devcloud@a4bf01930946:/work_ritesh/GenAIExamples/ChatQnA/docker/xeon$

Update:

Seeings logs, it's not able to find the huggingface cli token But i'm setting up the toke as with echo command it's displaying also

devcloud@a4bf01930946:/work_ritesh/GenAIExamples/ChatQnA/docker/xeon$ echo $HUGGINGFACEHUB_API_TOKEN
hf_*******************************

ritesh-intel commented 1 week ago

I deleted and purged all docker images and recreated everything after setting up the environment variables, so that huggingface-cli token issue can be resolved. Still 'Internal Server Error' is coming from Embedding service running on port 6000, with same issue as above of connectivity error to service with port 6006

It's still not detecting hugging-face cli token.

Below are updated logs in which it's not throwing hugging face cli issue also.

devcloud@a4bf01930946:/work_ritesh/GenAIExamples/ChatQnA/docker/xeon$ docker logs c05ff1b12f96
2024-07-04T10:29:42.614685Z  INFO text_embeddings_router: router/src/main.rs:140: Args { model_id: "BAA*/***-****-**-v1.5", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, hf_api_token: None, hostname: "c05ff1b12f96", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, cors_allow_origin: None }
2024-07-04T10:29:42.614780Z  INFO hf_hub: /usr/local/cargo/git/checkouts/hf-hub-1aadb4c6e2cbe1ba/b167f69/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-07-04T10:29:42.660678Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2024-07-04T10:29:42.660720Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:37: Model artifacts downloaded in 43.213µs
2024-07-04T10:29:42.673112Z  INFO text_embeddings_router: router/src/lib.rs:169: Maximum number of tokens per request: 512
2024-07-04T10:29:42.695405Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:23: Starting 112 tokenization workers
2024-07-04T10:29:43.098276Z  INFO text_embeddings_router: router/src/lib.rs:194: Starting model backend
2024-07-04T10:29:43.098632Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:124: Starting Bert model on Cpu
2024-07-04T10:29:43.414009Z  WARN text_embeddings_router: router/src/lib.rs:211: Backend does not support a batch size > 4
2024-07-04T10:29:43.414025Z  WARN text_embeddings_router: router/src/lib.rs:212: forcing `max_batch_requests=4`
2024-07-04T10:29:43.414278Z  WARN text_embeddings_router: router/src/lib.rs:263: Invalid hostname, defaulting to 0.0.0.0
2024-07-04T10:29:43.415599Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1555: Starting HTTP server: 0.0.0.0:80
2024-07-04T10:29:43.415605Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1556: Ready
2024-07-04T10:29:53.866618Z  INFO embed{total_time="16.20496ms" tokenization_time="330.043µs" queue_time="337.179µs" inference_time="15.456032ms"}: text_embeddings_router::http::server: router/src/http/server.rs:590: Success
devcloud@a4bf01930946:/work_ritesh/GenAIExamples/ChatQnA/docker/xeon$

letonghan commented 1 week ago

These logs 2024-07-04T10:29:43.415605Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1556: Ready 2024-07-04T10:29:53.866618Z INFO embed{total_time="16.20496ms" tokenization_time="330.043µs" queue_time="337.179µs" inference_time="15.456032ms"}: text_embeddings_router::http::server: router/src/http/server.rs:590: Success shows that the TEI embedding service is Ready, and it successfully response to the request and finish inference in 16.20496ms. You can check the service status with curl command, to verify the request result. https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker/xeon#validate-microservices

ritesh-intel commented 1 week ago

Yes, that should be the expected behaviour.

From validate microservices I'm calling the API

curl http://${host_ip}:6000/v1/embeddings\
  -X POST \
  -d '{"text":"hello"}' \
  -H 'Content-Type: application/json'

Below are the logs in Embed service running on port 6000: docker_logs_tei_embeddings.txt No new logs came to embed TEI service, i.e call didn't reach till that service even though it's up and running fine.

Is there some registry in megaservice where I can check which all services and up and c

onnected to each other, like it's there in kubernetes.

eero-t commented 1 week ago

Please add (long) logs as attachments, not inline. That way issue is (much) easier to read.

also recommend me how can I use this intel GPU system as in documentation I can only find Gaudi as GPU and Xeon as CPU.

I'm not part of this project, and haven't tested it myself, but...

II think that to use GPU, you would need TGI v2 (as Intel XPU support was added to TGI only in v2.02). However, for now ChatQnA manifests use only TGI v1.4 (see https://github.com/opea-project/GenAIComps/issues/230).

There seems to be another, OpenVINO (not TGI) based text-generation option, which might support Intel GPU: https://github.com/opea-project/GenAIComps/tree/main/comps/llms/text-generation/vllm-openvino ?

ritesh-intel commented 1 week ago

Hi @eero-t I've attached the logs as attachments in above comments

Can you please help me with this, I tried few things, but issue is still not resolved. Tei Embeddings docker container is not able to call the Embeddings service over 6000 port.

I tried creating one flask server running on 0.0.0.0:5000 in my host machine and then tried calling it from inside docker container, it's not able to call it as same issue is occuring of MaxRetryError.

opea-project / GenAIExamples

Internal Server Error in Embedding service and Megaservice in XPU example #371