Open ritesh-intel opened 2 weeks ago
It's working now but one more issue is coming that embedding service and Mega service is returning 'Internal Server Error'
devcloud@_:/work_ritesh$ curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "messages": "What is the revenue of Nike in 2023?" }' Internal Server Error devcloud@_:/work_ritesh$ curl http://${host_ip}:6000/v1/embeddings -X POST -d '{"text":"hello"}' -H 'Content-Type: application/json' Internal Server Error
Hi @ritesh-intel ,
Firstly, you should manually build docker image opea/dataprep-redis
on your server following the instruction here: https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker/xeon#5-build-dataprep-image
About the error Internal Server Error
, it could be caused by many reasons. Can you check the service log using docker logs ${container_name}
? So we can help you to debug further.
Hi @letonghan
Thanks for replying
Below are my logs after I call the embed service docker_logs_while_calling_tei_embeddings.txt
I made my workaround with opea/dataprep-redis
and after rebuilding it worked, Thanks!!
Here are my environment variables which I used for configuration export host_ip=0.0.0.0 export LANGCHAIN_TRACING_V2=true export LANGCHAIN_API_KEY=ls*** export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5" export RERANK_MODEL_ID="BAAI/bge-reranker-base" export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3" export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006" export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808" export TGI_LLM_ENDPOINT="http://${host_ip}:9009" export REDIS_URL="redis://${host_ip}:6379" export INDEX_NAME="rag-redis" export HUGGINGFACEHUB_APITOKEN=hf**** export MEGA_SERVICE_HOST_IP=${host_ip} export EMBEDDING_SERVICE_HOST_IP=${host_ip} export RETRIEVER_SERVICE_HOST_IP=${host_ip} export RERANK_SERVICE_HOST_IP=${host_ip} export LLM_SERVICE_HOST_IP=${host_ip} export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna" export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
when I was setting up http_proxy and https_proxy, it was giving error that, not able to connect to proxy-dmz.intel.com so I removed it. export http_proxy=http://proxy-dmz.intel.com:911 export https_proxy=http://proxy-dmz.intel.com:912
Ok.
requests.exceptions.ConnectionError: (MaxRetryError("HTTPConnectionPool(host='0.0.0.0', port=6006): Max retries exceeded
means that your tei-embedding service haven't been successfully started.
Can you check the container log of it?
038ba787cb21 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 40 minutes ago Up 40 minutes 0.0.0.0:6006->80/tcp, :::6006->80/tcp tei-embedding-server
Sure, here are the logs of the embed service
devcloud@a4bf01930946:/work_ritesh/GenAIExamples/ChatQnA/docker/xeon$ docker logs 038ba787cb21
2024-07-04T09:22:34.198952Z INFO text_embeddings_router: router/src/main.rs:140: Args { model_id: "BAA*/***-****-**-v1.5", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, hf_api_token: None, hostname: "038ba787cb21", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, cors_allow_origin: None }
2024-07-04T09:22:34.199063Z INFO hf_hub: /usr/local/cargo/git/checkouts/hf-hub-1aadb4c6e2cbe1ba/b167f69/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-07-04T09:22:34.243738Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2024-07-04T09:22:34.243849Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:37: Model artifacts downloaded in 112.708µs
2024-07-04T09:22:34.257024Z INFO text_embeddings_router: router/src/lib.rs:169: Maximum number of tokens per request: 512
2024-07-04T09:22:34.281426Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:23: Starting 112 tokenization workers
2024-07-04T09:22:34.665414Z INFO text_embeddings_router: router/src/lib.rs:194: Starting model backend
2024-07-04T09:22:34.665741Z INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:124: Starting Bert model on Cpu
2024-07-04T09:22:35.005224Z WARN text_embeddings_router: router/src/lib.rs:211: Backend does not support a batch size > 4
2024-07-04T09:22:35.005242Z WARN text_embeddings_router: router/src/lib.rs:212: forcing `max_batch_requests=4`
2024-07-04T09:22:35.005471Z WARN text_embeddings_router: router/src/lib.rs:263: Invalid hostname, defaulting to 0.0.0.0
2024-07-04T09:22:35.006779Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1555: Starting HTTP server: 0.0.0.0:80
2024-07-04T09:22:35.006786Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1556: Ready
2024-07-04T09:22:47.549273Z INFO embed{total_time="32.073214ms" tokenization_time="5.298659ms" queue_time="303.183µs" inference_time="26.384068ms"}: text_embeddings_router::http::server: router/src/http/server.rs:590: Success
devcloud@a4bf01930946:/work_ritesh/GenAIExamples/ChatQnA/docker/xeon$
Update:
Seeings logs, it's not able to find the huggingface cli token But i'm setting up the toke as with echo command it's displaying also
devcloud@a4bf01930946:/work_ritesh/GenAIExamples/ChatQnA/docker/xeon$ echo $HUGGINGFACEHUB_API_TOKEN
hf_*******************************
I deleted and purged all docker images and recreated everything after setting up the environment variables, so that huggingface-cli token issue can be resolved. Still 'Internal Server Error' is coming from Embedding service running on port 6000, with same issue as above of connectivity error to service with port 6006
It's still not detecting hugging-face cli token.
Below are updated logs in which it's not throwing hugging face cli issue also.
devcloud@a4bf01930946:/work_ritesh/GenAIExamples/ChatQnA/docker/xeon$ docker logs c05ff1b12f96
2024-07-04T10:29:42.614685Z INFO text_embeddings_router: router/src/main.rs:140: Args { model_id: "BAA*/***-****-**-v1.5", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, hf_api_token: None, hostname: "c05ff1b12f96", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, cors_allow_origin: None }
2024-07-04T10:29:42.614780Z INFO hf_hub: /usr/local/cargo/git/checkouts/hf-hub-1aadb4c6e2cbe1ba/b167f69/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-07-04T10:29:42.660678Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2024-07-04T10:29:42.660720Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:37: Model artifacts downloaded in 43.213µs
2024-07-04T10:29:42.673112Z INFO text_embeddings_router: router/src/lib.rs:169: Maximum number of tokens per request: 512
2024-07-04T10:29:42.695405Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:23: Starting 112 tokenization workers
2024-07-04T10:29:43.098276Z INFO text_embeddings_router: router/src/lib.rs:194: Starting model backend
2024-07-04T10:29:43.098632Z INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:124: Starting Bert model on Cpu
2024-07-04T10:29:43.414009Z WARN text_embeddings_router: router/src/lib.rs:211: Backend does not support a batch size > 4
2024-07-04T10:29:43.414025Z WARN text_embeddings_router: router/src/lib.rs:212: forcing `max_batch_requests=4`
2024-07-04T10:29:43.414278Z WARN text_embeddings_router: router/src/lib.rs:263: Invalid hostname, defaulting to 0.0.0.0
2024-07-04T10:29:43.415599Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1555: Starting HTTP server: 0.0.0.0:80
2024-07-04T10:29:43.415605Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1556: Ready
2024-07-04T10:29:53.866618Z INFO embed{total_time="16.20496ms" tokenization_time="330.043µs" queue_time="337.179µs" inference_time="15.456032ms"}: text_embeddings_router::http::server: router/src/http/server.rs:590: Success
devcloud@a4bf01930946:/work_ritesh/GenAIExamples/ChatQnA/docker/xeon$
These logs 2024-07-04T10:29:43.415605Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1556: Ready 2024-07-04T10:29:53.866618Z INFO embed{total_time="16.20496ms" tokenization_time="330.043µs" queue_time="337.179µs" inference_time="15.456032ms"}: text_embeddings_router::http::server: router/src/http/server.rs:590: Success
shows that the TEI embedding service is Ready
, and it successfully response to the request and finish inference in 16.20496ms
.
You can check the service status with curl command, to verify the request result. https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker/xeon#validate-microservices
Yes, that should be the expected behaviour.
From validate microservices I'm calling the API
curl http://${host_ip}:6000/v1/embeddings\
-X POST \
-d '{"text":"hello"}' \
-H 'Content-Type: application/json'
Below are the logs in Embed service running on port 6000: docker_logs_tei_embeddings.txt No new logs came to embed TEI service, i.e call didn't reach till that service even though it's up and running fine.
Is there some registry in megaservice where I can check which all services and up and c
onnected to each other, like it's there in kubernetes.
Please add (long) logs as attachments, not inline. That way issue is (much) easier to read.
also recommend me how can I use this intel GPU system as in documentation I can only find Gaudi as GPU and Xeon as CPU.
I'm not part of this project, and haven't tested it myself, but...
II think that to use GPU, you would need TGI v2 (as Intel XPU support was added to TGI only in v2.02). However, for now ChatQnA manifests use only TGI v1.4 (see https://github.com/opea-project/GenAIComps/issues/230).
There seems to be another, OpenVINO (not TGI) based text-generation option, which might support Intel GPU: https://github.com/opea-project/GenAIComps/tree/main/comps/llms/text-generation/vllm-openvino ?
Hi @eero-t I've attached the logs as attachments in above comments
Can you please help me with this, I tried few things, but issue is still not resolved. Tei Embeddings docker container is not able to call the Embeddings service over 6000 port.
I tried creating one flask server running on 0.0.0.0:5000 in my host machine and then tried calling it from inside docker container, it's not able to call it as same issue is occuring of MaxRetryError.
I'm creating POC for testing microservices architecture in GPU system provided by opea. Build Mega Service of ChatQnA on Xeon
While running the command
docker compose -f docker_compose.yaml up -d
it's giving the following outputWhere should I do the docker login, is opea/dataprep-redis available in docker hub or some other private repository?
Below are details of my CPU and GPU, also recommend me how can I use this intel GPU system as in documentation I can only find Gaudi as GPU and Xeon as CPU. I am utilizing ipex_llm for XPU