Closed arun-gupta closed 2 months ago
The service is still giving the same error after five hours:
ubuntu@ip-172-31-37-13:~$ curl http://${host_ip}:9009/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' -H 'Content-Type: application/json'
curl: (7) Failed to connect to 172.31.37.13 port 9009 after 0 ms: Couldn't connect to server
Here are logs:
ubuntu@ip-172-31-37-13:~$ sudo docker compose logs
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
reranking-tei-xeon-server | /home/user/.local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
tei-embedding-server | 2024-08-30T18:59:12.924208Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-****-**-v1.5", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "2f27cdb160ff", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
tei-reranking-server | 2024-08-30T18:59:12.958211Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-********-*ase", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "7fd37f4b6037", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
reranking-tei-xeon-server |
reranking-tei-xeon-server | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
reranking-tei-xeon-server | warnings.warn(
tei-reranking-server | 2024-08-30T18:59:12.958284Z INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
tei-reranking-server | 2024-08-30T18:59:13.004747Z INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
tei-embedding-server | 2024-08-30T18:59:12.924290Z INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
reranking-tei-xeon-server | [2024-08-30 18:59:16,271] [ INFO] - Base service - CORS is enabled.
tei-embedding-server | 2024-08-30T18:59:12.973834Z INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
tei-reranking-server | 2024-08-30T18:59:13.155043Z INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
tei-reranking-server | 2024-08-30T18:59:13.171879Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
tei-embedding-server | 2024-08-30T18:59:13.069091Z INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
reranking-tei-xeon-server | [2024-08-30 18:59:16,271] [ INFO] - Base service - Setting up HTTP server
reranking-tei-xeon-server | [2024-08-30 18:59:16,272] [ INFO] - Base service - Uvicorn server setup on port 8000
tei-embedding-server | 2024-08-30T18:59:13.138839Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
tei-embedding-server | 2024-08-30T18:59:13.138853Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
tei-embedding-server | 2024-08-30T18:59:13.166225Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
tei-embedding-server | 2024-08-30T18:59:13.242409Z INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
tei-embedding-server | 2024-08-30T18:59:13.261180Z WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:317: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-base-en-v1.5/resolve/main/model.onnx)
tei-embedding-server | 2024-08-30T18:59:13.261201Z INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:318: Downloading `onnx/model.onnx`
reranking-tei-xeon-server | INFO: Waiting for application startup.
tei-embedding-server | 2024-08-30T18:59:14.089828Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 950.976918ms
tei-embedding-server | 2024-08-30T18:59:14.102507Z INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
tgi-service | 2024-08-30T18:59:12.959781Z INFO text_generation_launcher: Args {
tgi-service | model_id: "Intel/neural-chat-7b-v3-3",
tei-embedding-server | 2024-08-30T18:59:14.102781Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 8 tokenization workers
tei-embedding-server | 2024-08-30T18:59:14.144519Z INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
tei-embedding-server | 2024-08-30T18:59:15.160364Z WARN text_embeddings_router: router/src/lib.rs:267: Backend does not support a batch size > 8
tei-embedding-server | 2024-08-30T18:59:15.160393Z WARN text_embeddings_router: router/src/lib.rs:268: forcing `max_batch_requests=8`
reranking-tei-xeon-server | INFO: Application startup complete.
reranking-tei-xeon-server | INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
reranking-tei-xeon-server | [2024-08-30 18:59:16,280] [ INFO] - Base service - HTTP server setup successful
tgi-service | revision: None,
tgi-service | validation_workers: 2,
tgi-service | sharded: None,
tgi-service | num_shard: None,
tgi-service | quantize: None,
tgi-service | speculate: None,
chatqna-xeon-ui-server |
chatqna-xeon-ui-server | > sveltekit-auth-example@0.0.1 preview
tei-embedding-server | 2024-08-30T18:59:15.160518Z WARN text_embeddings_router: router/src/lib.rs:319: Invalid hostname, defaulting to 0.0.0.0
tei-embedding-server | 2024-08-30T18:59:15.162024Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1778: Starting HTTP server: 0.0.0.0:80
tei-embedding-server | 2024-08-30T18:59:15.162038Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1779: Ready
tei-embedding-server | 2024-08-30T19:03:19.614297Z INFO embed{total_time="10.586548ms" tokenization_time="185.813ยตs" queue_time="499.042ยตs" inference_time="9.826847ms"}: text_embeddings_router::http::server: router/src/http/server.rs:706: Success
chatqna-xeon-ui-server | > vite preview --port 5173 --host 0.0.0.0
chatqna-xeon-ui-server |
chatqna-xeon-ui-server |
chatqna-xeon-ui-server | โ Local: http://localhost:5173/
chatqna-xeon-ui-server | โ Network: http://172.18.0.12:5173/
redis-vector-db | 9:C 30 Aug 2024 18:59:12.941 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
redis-vector-db | 9:C 30 Aug 2024 18:59:12.941 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis-vector-db | 9:C 30 Aug 2024 18:59:12.941 * Redis version=7.2.4, bits=64, commit=00000000, modified=0, pid=9, just started
redis-vector-db | 9:C 30 Aug 2024 18:59:12.941 * Configuration loaded
redis-vector-db | 9:M 30 Aug 2024 18:59:12.941 * monotonic clock: POSIX clock_gettime
tgi-service | dtype: None,
tgi-service | trust_remote_code: false,
tgi-service | max_concurrent_requests: 128,
tgi-service | max_best_of: 2,
tgi-service | max_stop_sequences: 4,
embedding-tei-server | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:184: UserWarning: Field name "downstream_black_list" shadows an attribute in parent "TopologyInfo";
tei-reranking-server | 2024-08-30T18:59:13.171889Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
redis-vector-db | 9:M 30 Aug 2024 18:59:12.942 * Running mode=standalone, port=6379.
redis-vector-db | 9:M 30 Aug 2024 18:59:12.942 * Module 'RedisCompat' loaded from /opt/redis-stack/lib/rediscompat.so
redis-vector-db | 9:M 30 Aug 2024 18:59:12.944 * <search> Redis version found by RedisSearch : 7.2.4 - oss
embedding-tei-server | warnings.warn(
embedding-tei-server | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
embedding-tei-server |
embedding-tei-server | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
embedding-tei-server | warnings.warn(
embedding-tei-server | [2024-08-30 18:59:16,190] [ INFO] - Base service - CORS is enabled.
embedding-tei-server | [2024-08-30 18:59:16,191] [ INFO] - Base service - Setting up HTTP server
embedding-tei-server | [2024-08-30 18:59:16,191] [ INFO] - Base service - Uvicorn server setup on port 6000
embedding-tei-server | INFO: Waiting for application startup.
retriever-redis-server | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:184: UserWarning: Field name "downstream_black_list" shadows an attribute in parent "TopologyInfo";
embedding-tei-server | INFO: Application startup complete.
retriever-redis-server | warnings.warn(
retriever-redis-server | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
retriever-redis-server |
retriever-redis-server | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
retriever-redis-server | warnings.warn(
retriever-redis-server | [2024-08-30 18:59:16,210] [ INFO] - Base service - CORS is enabled.
retriever-redis-server | [2024-08-30 18:59:16,211] [ INFO] - Base service - Setting up HTTP server
retriever-redis-server | [2024-08-30 18:59:16,212] [ INFO] - Base service - Uvicorn server setup on port 7000
tei-reranking-server | 2024-08-30T18:59:13.225855Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
tei-reranking-server | 2024-08-30T18:59:13.582628Z INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
tgi-service | max_top_n_tokens: 5,
llm-tgi-server | Defaulting to user installation because normal site-packages is not writeable
llm-tgi-server | Collecting langserve (from -r requirements-runtime.txt (line 1))
tgi-service | max_input_tokens: None,
tgi-service | max_input_length: None,
tgi-service | max_total_tokens: None,
tgi-service | waiting_served_ratio: 0.3,
tgi-service | max_batch_prefill_tokens: None,
tgi-service | max_batch_total_tokens: None,
tgi-service | max_waiting_tokens: 20,
tgi-service | max_batch_size: None,
tgi-service | cuda_graphs: Some(
tei-reranking-server | 2024-08-30T18:59:13.602241Z WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:317: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-base/resolve/main/model.onnx)
tei-reranking-server | 2024-08-30T18:59:13.602265Z INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:318: Downloading `onnx/model.onnx`
tei-reranking-server | 2024-08-30T18:59:14.691879Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 1.519996915s
tei-reranking-server | 2024-08-30T18:59:15.270436Z WARN text_embeddings_router: router/src/lib.rs:195: Could not find a Sentence Transformers config
tei-reranking-server | 2024-08-30T18:59:15.270455Z INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
tei-reranking-server | 2024-08-30T18:59:15.270766Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 8 tokenization workers
tei-reranking-server | 2024-08-30T18:59:17.123235Z INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
tei-reranking-server | 2024-08-30T18:59:18.915992Z WARN text_embeddings_router: router/src/lib.rs:267: Backend does not support a batch size > 8
tei-reranking-server | 2024-08-30T18:59:18.916011Z WARN text_embeddings_router: router/src/lib.rs:268: forcing `max_batch_requests=8`
chatqna-xeon-backend-server | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
tgi-service | [
chatqna-xeon-backend-server |
tgi-service | 0,
llm-tgi-server | Downloading langserve-0.2.2-py3-none-any.whl.metadata (39 kB)
chatqna-xeon-backend-server | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
llm-tgi-server | Requirement already satisfied: httpx>=0.23.0 in /home/user/.local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (0.27.0)
llm-tgi-server | Requirement already satisfied: langchain-core<0.3,>=0.1 in /usr/local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (0.1.7)
tei-reranking-server | 2024-08-30T18:59:18.916130Z WARN text_embeddings_router: router/src/lib.rs:319: Invalid hostname, defaulting to 0.0.0.0
chatqna-xeon-backend-server | warnings.warn(
chatqna-xeon-backend-server | [2024-08-30 18:59:14,995] [ INFO] - Base service - CORS is enabled.
chatqna-xeon-backend-server | [2024-08-30 18:59:14,996] [ INFO] - Base service - Setting up HTTP server
chatqna-xeon-backend-server | [2024-08-30 18:59:14,997] [ INFO] - Base service - Uvicorn server setup on port 8888
chatqna-xeon-backend-server | INFO: Waiting for application startup.
chatqna-xeon-backend-server | INFO: Application startup complete.
chatqna-xeon-backend-server | INFO: Uvicorn running on http://0.0.0.0:8888 (Press CTRL+C to quit)
chatqna-xeon-backend-server | [2024-08-30 18:59:15,010] [ INFO] - Base service - HTTP server setup successful
llm-tgi-server | Requirement already satisfied: orjson>=2 in /home/user/.local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (3.10.7)
llm-tgi-server | Requirement already satisfied: pydantic>=1 in /usr/local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (2.5.3)
retriever-redis-server | INFO: Waiting for application startup.
tgi-service | ],
retriever-redis-server | INFO: Application startup complete.
retriever-redis-server | INFO: Uvicorn running on http://0.0.0.0:7000 (Press CTRL+C to quit)
retriever-redis-server | [2024-08-30 18:59:16,214] [ INFO] - Base service - HTTP server setup successful
dataprep-redis-server | /home/user/.local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
dataprep-redis-server |
dataprep-redis-server | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
dataprep-redis-server | warnings.warn(
dataprep-redis-server | /home/user/.local/lib/python3.11/site-packages/langchain/__init__.py:30: UserWarning: Importing LLMChain from langchain root module is no longer supported. Please use langchain.chains.LLMChain instead.
dataprep-redis-server | warnings.warn(
dataprep-redis-server | /home/user/.local/lib/python3.11/site-packages/langchain/__init__.py:30: UserWarning: Importing PromptTemplate from langchain root module is no longer supported. Please use langchain_core.prompts.PromptTemplate instead.
dataprep-redis-server | warnings.warn(
dataprep-redis-server | [2024-08-30 18:59:17,957] [ INFO] - Base service - CORS is enabled.
tgi-service | ),
tgi-service | hostname: "a0a208e32895",
tgi-service | port: 80,
tgi-service | shard_uds_path: "/tmp/text-generation-server",
tgi-service | master_addr: "localhost",
tgi-service | master_port: 29500,
llm-tgi-server | Collecting pyproject-toml<0.0.11,>=0.0.10 (from langserve->-r requirements-runtime.txt (line 1))
redis-vector-db | 9:M 30 Aug 2024 18:59:12.944 * <search> RediSearch version 2.8.12 (Git=2.8-32fdaca)
redis-vector-db | 9:M 30 Aug 2024 18:59:12.944 * <search> Low level api version 1 initialized successfully
redis-vector-db | 9:M 30 Aug 2024 18:59:12.944 * <search> concurrent writes: OFF, gc: ON, prefix min length: 2, prefix max expansions: 200, query timeout (ms): 500, timeout policy: return, cursor read size: 1000, cursor max idle (ms): 300000, max doctable size: 1000000, max number of search results: 10000, search pool size: 20, index pool size: 8,
redis-vector-db | 9:M 30 Aug 2024 18:59:12.944 * <search> Initialized thread pools!
redis-vector-db | 9:M 30 Aug 2024 18:59:12.944 * <search> Enabled role change notification
redis-vector-db | 9:M 30 Aug 2024 18:59:12.944 * Module 'search' loaded from /opt/redis-stack/lib/redisearch.so
redis-vector-db | 9:M 30 Aug 2024 18:59:12.945 * <timeseries> RedisTimeSeries version 11011, git_sha=0299ac12a6bf298028859c41ba0f4d8dc842726b
redis-vector-db | 9:M 30 Aug 2024 18:59:12.945 * <timeseries> Redis version found by RedisTimeSeries : 7.2.4 - oss
redis-vector-db | 9:M 30 Aug 2024 18:59:12.945 * <timeseries> loaded default CHUNK_SIZE_BYTES policy: 4096
redis-vector-db | 9:M 30 Aug 2024 18:59:12.945 * <timeseries> loaded server DUPLICATE_POLICY: block
redis-vector-db | 9:M 30 Aug 2024 18:59:12.945 * <timeseries> Setting default series ENCODING to: compressed
redis-vector-db | 9:M 30 Aug 2024 18:59:12.945 * <timeseries> Detected redis oss
redis-vector-db | 9:M 30 Aug 2024 18:59:12.945 * Module 'timeseries' loaded from /opt/redis-stack/lib/redistimeseries.so
tei-reranking-server | 2024-08-30T18:59:18.917526Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1778: Starting HTTP server: 0.0.0.0:80
tgi-service | huggingface_hub_cache: Some(
embedding-tei-server | INFO: Uvicorn running on http://0.0.0.0:6000 (Press CTRL+C to quit)
embedding-tei-server | [2024-08-30 18:59:16,194] [ INFO] - Base service - HTTP server setup successful
redis-vector-db | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Created new data type 'ReJSON-RL'
redis-vector-db | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> version: 20609 git sha: unknown branch: unknown
redis-vector-db | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Exported RedisJSON_V1 API
tgi-service | "/data",
tgi-service | ),
tgi-service | weights_cache_override: None,
tgi-service | disable_custom_kernels: false,
tgi-service | cuda_memory_fraction: 1.0,
tgi-service | rope_scaling: None,
tgi-service | rope_factor: None,
dataprep-redis-server | [2024-08-30 18:59:17,958] [ INFO] - Base service - Setting up HTTP server
llm-tgi-server | Downloading pyproject_toml-0.0.10-py3-none-any.whl.metadata (642 bytes)
llm-tgi-server | Requirement already satisfied: anyio in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (4.2.0)
llm-tgi-server | Requirement already satisfied: certifi in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (2023.11.17)
llm-tgi-server | Requirement already satisfied: httpcore==1.* in /home/user/.local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (1.0.5)
llm-tgi-server | Requirement already satisfied: idna in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (3.6)
llm-tgi-server | Requirement already satisfied: sniffio in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (1.3.0)
llm-tgi-server | Requirement already satisfied: h11<0.15,>=0.13 in /home/user/.local/lib/python3.11/site-packages (from httpcore==1.*->httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (0.14.0)
llm-tgi-server | Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (6.0.1)
llm-tgi-server | Requirement already satisfied: jsonpatch<2.0,>=1.33 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (1.33)
llm-tgi-server | Requirement already satisfied: langsmith<0.1.0,>=0.0.63 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (0.0.77)
llm-tgi-server | Requirement already satisfied: packaging<24.0,>=23.2 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (23.2)
llm-tgi-server | Requirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (2.31.0)
llm-tgi-server | Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (8.2.3)
llm-tgi-server | Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.11/site-packages (from pydantic>=1->langserve->-r requirements-runtime.txt (line 1)) (0.6.0)
llm-tgi-server | Requirement already satisfied: pydantic-core==2.14.6 in /usr/local/lib/python3.11/site-packages (from pydantic>=1->langserve->-r requirements-runtime.txt (line 1)) (2.14.6)
llm-tgi-server | Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.11/site-packages (from pydantic>=1->langserve->-r requirements-runtime.txt (line 1)) (4.9.0)
llm-tgi-server | Requirement already satisfied: setuptools>=42 in /usr/local/lib/python3.11/site-packages (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (65.5.1)
llm-tgi-server | Requirement already satisfied: wheel in /usr/local/lib/python3.11/site-packages (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (0.42.0)
llm-tgi-server | Collecting toml (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1))
llm-tgi-server | Downloading toml-0.10.2-py2.py3-none-any.whl.metadata (7.1 kB)
llm-tgi-server | Requirement already satisfied: jsonschema in /home/user/.local/lib/python3.11/site-packages (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (4.23.0)
llm-tgi-server | Requirement already satisfied: jsonpointer>=1.9 in /usr/local/lib/python3.11/site-packages (from jsonpatch<2.0,>=1.33->langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (2.4)
llm-tgi-server | Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/site-packages (from requests<3,>=2->langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (3.3.2)
llm-tgi-server | Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/site-packages (from requests<3,>=2->langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (2.1.0)
llm-tgi-server | Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (23.2.0)
llm-tgi-server | Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /home/user/.local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (2023.12.1)
llm-tgi-server | Requirement already satisfied: referencing>=0.28.4 in /home/user/.local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (0.35.1)
llm-tgi-server | Requirement already satisfied: rpds-py>=0.7.1 in /home/user/.local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (0.20.0)
llm-tgi-server | Downloading langserve-0.2.2-py3-none-any.whl (1.2 MB)
llm-tgi-server | โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 1.2/1.2 MB 114.0 MB/s eta 0:00:00
llm-tgi-server | Downloading pyproject_toml-0.0.10-py3-none-any.whl (6.9 kB)
llm-tgi-server | Downloading toml-0.10.2-py2.py3-none-any.whl (16 kB)
llm-tgi-server | Installing collected packages: toml, pyproject-toml, langserve
llm-tgi-server | Successfully installed langserve-0.2.2 pyproject-toml-0.0.10 toml-0.10.2
llm-tgi-server | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:184: UserWarning: Field name "downstream_black_list" shadows an attribute in parent "TopologyInfo";
llm-tgi-server | warnings.warn(
llm-tgi-server | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
tgi-service | json_output: false,
tgi-service | otlp_endpoint: None,
redis-vector-db | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Exported RedisJSON_V2 API
tgi-service | otlp_service_name: "text-generation-inference.router",
redis-vector-db | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Exported RedisJSON_V3 API
redis-vector-db | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Exported RedisJSON_V4 API
redis-vector-db | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Exported RedisJSON_V5 API
redis-vector-db | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Enabled diskless replication
redis-vector-db | 9:M 30 Aug 2024 18:59:12.946 * Module 'ReJSON' loaded from /opt/redis-stack/lib/rejson.so
redis-vector-db | 9:M 30 Aug 2024 18:59:12.946 * <search> Acquired RedisJSON_V5 API
redis-vector-db | 9:M 30 Aug 2024 18:59:12.946 * <bf> RedisBloom version 2.6.12 (Git=unknown)
redis-vector-db | 9:M 30 Aug 2024 18:59:12.946 * Module 'bf' loaded from /opt/redis-stack/lib/redisbloom.so
redis-vector-db | 9:M 30 Aug 2024 18:59:12.946 * <redisgears_2> Created new data type 'GearsType'
redis-vector-db | 9:M 30 Aug 2024 18:59:12.946 * <redisgears_2> Detected redis oss
redis-vector-db | 9:M 30 Aug 2024 18:59:12.947 # <redisgears_2> could not initialize RedisAI_InitError
redis-vector-db |
redis-vector-db | 9:M 30 Aug 2024 18:59:12.947 * <redisgears_2> Failed loading RedisAI API.
redis-vector-db | 9:M 30 Aug 2024 18:59:12.947 * <redisgears_2> RedisGears v2.0.19, sha='671030bbcb7de4582d00575a0902f826da3efe73', build_type='release', built_for='Linux-ubuntu22.04.x86_64'.
redis-vector-db | 9:M 30 Aug 2024 18:59:12.947 * <redisgears_2> Registered backend: js.
redis-vector-db | 9:M 30 Aug 2024 18:59:12.947 * Module 'redisgears_2' loaded from /opt/redis-stack/lib/redisgears.so
redis-vector-db | 9:M 30 Aug 2024 18:59:12.948 * Server initialized
redis-vector-db | 9:M 30 Aug 2024 18:59:12.948 * Ready to accept connections tcp
dataprep-redis-server | [2024-08-30 18:59:17,959] [ INFO] - Base service - Uvicorn server setup on port 6007
dataprep-redis-server | INFO: Waiting for application startup.
tei-reranking-server | 2024-08-30T18:59:18.917532Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1779: Ready
retriever-redis-server | /home/user/.local/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:141: LangChainDeprecationWarning: The class `HuggingFaceHubEmbeddings` was deprecated in LangChain 0.2.2 and will be removed in 0.3.0. An updated version of the class exists in the langchain-huggingface package and should be used instead. To use it run `pip install -U langchain-huggingface` and import as `from langchain_huggingface import HuggingFaceEndpointEmbeddings`.
retriever-redis-server | warn_deprecated(
embedding-tei-server | [2024-08-30 18:59:16,248] [ INFO] - embedding_tei_langchain - TEI Gaudi Embedding initialized.
embedding-tei-server | INFO: 172.31.37.13:48430 - "POST /v1/embeddings HTTP/1.1" 200 OK
dataprep-redis-server | INFO: Application startup complete.
dataprep-redis-server | INFO: Uvicorn running on http://0.0.0.0:6007 (Press CTRL+C to quit)
dataprep-redis-server | [2024-08-30 18:59:17,961] [ INFO] - Base service - HTTP server setup successful
tgi-service | cors_allow_origin: [],
llm-tgi-server |
llm-tgi-server | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
llm-tgi-server | warnings.warn(
llm-tgi-server | [2024-08-30 18:59:16,096] [ INFO] - Base service - CORS is enabled.
llm-tgi-server | [2024-08-30 18:59:16,097] [ INFO] - Base service - Setting up HTTP server
llm-tgi-server | [2024-08-30 18:59:16,097] [ INFO] - Base service - Uvicorn server setup on port 9000
llm-tgi-server | INFO: Waiting for application startup.
llm-tgi-server | INFO: Application startup complete.
llm-tgi-server | INFO: Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit)
llm-tgi-server | [2024-08-30 18:59:16,100] [ INFO] - Base service - HTTP server setup successful
retriever-redis-server | INFO: 172.31.37.13:43390 - "POST /v1/retrieval HTTP/1.1" 200 OK
tgi-service | api_key: None,
tgi-service | watermark_gamma: None,
tgi-service | watermark_delta: None,
tgi-service | ngrok: false,
tgi-service | ngrok_authtoken: None,
tgi-service | ngrok_edge: None,
tgi-service | tokenizer_config_path: None,
tgi-service | disable_grammar_support: false,
tgi-service | env: false,
tgi-service | max_client_batch_size: 4,
tgi-service | lora_adapters: None,
tgi-service | usage_stats: On,
tgi-service | }
tgi-service | 2024-08-30T18:59:12.959986Z INFO hf_hub: Token file not found "/root/.cache/huggingface/token"
tgi-service | 2024-08-30T18:59:13.012501Z INFO text_generation_launcher: Model supports up to 32768 but tgi will now set its default to 4096 instead. This is to save VRAM by refusing large prompts in order to allow more users on the same hardware. You can increase that size using `--max-batch-prefill-tokens=32818 --max-total-tokens=32768 --max-input-tokens=32767`.
tgi-service | 2024-08-30T18:59:13.012520Z INFO text_generation_launcher: Default `max_input_tokens` to 4095
tgi-service | 2024-08-30T18:59:13.012522Z INFO text_generation_launcher: Default `max_total_tokens` to 4096
tgi-service | 2024-08-30T18:59:13.012523Z INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4145
tgi-service | 2024-08-30T18:59:13.012616Z INFO download: text_generation_launcher: Starting check and download process for Intel/neural-chat-7b-v3-3
tgi-service | 2024-08-30T18:59:17.123445Z WARN text_generation_launcher: No safetensors weights found for model Intel/neural-chat-7b-v3-3 at revision None. Downloading PyTorch weights.
tgi-service | 2024-08-30T18:59:17.156988Z INFO text_generation_launcher: Download file: pytorch_model-00001-of-00002.bin
tgi-service | 2024-08-30T18:59:46.974574Z INFO text_generation_launcher: Downloaded /data/models--Intel--neural-chat-7b-v3-3/snapshots/bdd31cf498d13782cc7497cba5896996ce429f91/pytorch_model-00001-of-00002.bin in 0:00:29.
tgi-service | 2024-08-30T18:59:46.974598Z INFO text_generation_launcher: Download: [1/2] -- ETA: 0:00:29
tgi-service | 2024-08-30T18:59:46.974779Z INFO text_generation_launcher: Download file: pytorch_model-00002-of-00002.bin
tgi-service | 2024-08-30T19:00:17.058207Z INFO text_generation_launcher: Downloaded /data/models--Intel--neural-chat-7b-v3-3/snapshots/bdd31cf498d13782cc7497cba5896996ce429f91/pytorch_model-00002-of-00002.bin in 0:00:30.
tgi-service | 2024-08-30T19:00:17.058225Z INFO text_generation_launcher: Download: [2/2] -- ETA: 0
tgi-service | 2024-08-30T19:00:17.058238Z WARN text_generation_launcher: ๐จ๐จBREAKING CHANGE in 2.0๐จ๐จ: Safetensors conversion is disabled without `--trust-remote-code` because Pickle files are unsafe and can essentially contain remote code execution!Please check for more information here: https://huggingface.co/docs/text-generation-inference/basic_tutorials/safety
tgi-service | 2024-08-30T19:00:17.058243Z WARN text_generation_launcher: No safetensors weights found for model Intel/neural-chat-7b-v3-3 at revision None. Converting PyTorch weights to safetensors.
tgi-service | Error: DownloadError
tgi-service | 2024-08-30T19:01:00.144114Z ERROR download: text_generation_launcher: Download encountered an error:
tgi-service | The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
tgi-service | 2024-08-30 18:59:16.457 | INFO | text_generation_server.utils.import_utils:<module>:75 - Detected system ipex
tgi-service | /opt/conda/lib/python3.10/site-packages/text_generation_server/utils/sgmv.py:18: UserWarning: Could not import SGMV kernel from Punica, falling back to loop.
tgi-service | warnings.warn("Could not import SGMV kernel from Punica, falling back to loop.")
tgi-service | โญโโโโโโโโโโโโโโโโโโโโโ Traceback (most recent call last) โโโโโโโโโโโโโโโโโโโโโโโฎ
tgi-service | โ /opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py:324 in โ
tgi-service | โ download_weights โ
tgi-service | โ ๏ฟฝ๏ฟฝ
tgi-service | โ 321 โ โ except Exception: โ
tgi-service | โ 322 โ โ โ discard_names = [] โ
tgi-service | โ 323 โ โ # Convert pytorch weights to safetensors โ
tgi-service | โ โฑ 324 โ โ utils.convert_files(local_pt_files, local_st_files, discard_na โ
tgi-service | โ 325 โ
tgi-service | โ 326 โ
tgi-service | โ 327 @app.command() โ
tgi-service | โ ๏ฟฝ๏ฟฝ๏ฟฝ
tgi-service | โ โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ locals โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ โ
tgi-service | โ โ architecture = 'MistralForCausalLM' โ โ
tgi-service | โ โ auto_convert = True โ โ
tgi-service | โ โ base_model_id = None โ โ
tgi-service | โ โ class_ = <class โ โ
tgi-service | โ โ 'transformers.models.mistral.modeling_mistral.Mistrโฆ โ โ
tgi-service | โ โ config = { โ โ
tgi-service | โ โ โ '_name_or_path': './neural-chat-7b-v3-9', โ โ
tgi-service | โ โ โ 'architectures': ['MistralForCausalLM'], โ โ
tgi-service | โ โ โ 'bos_token_id': 1, โ โ
tgi-service | โ โ โ 'eos_token_id': 2, โ โ
tgi-service | โ โ โ 'hidden_act': 'silu', โ โ
tgi-service | โ โ โ 'hidden_size': 4096, โ โ
tgi-service | โ โ โ 'initializer_range': 0.02, โ โ
tgi-service | โ โ โ 'intermediate_size': 14336, โ โ
tgi-service | โ โ โ 'max_position_embeddings': 32768, โ โ
tgi-service | โ โ โ 'model_type': 'mistral', โ โ
tgi-service | โ โ โ ... +11 โ โ
tgi-service | โ โ } โ โ
tgi-service | โ โ config_filename = '/data/models--Intel--neural-chat-7b-v3-3/snapshotsโฆ โ โ
tgi-service | โ โ discard_names = ['lm_head.weight'] โ โ
tgi-service | โ โ extension = '.safetensors' โ โ
tgi-service | โ โ f = <_io.TextIOWrapper โ โ
tgi-service | โ โ name='/data/models--Intel--neural-chat-7b-v3-3/snapโฆ โ โ
tgi-service | โ โ mode='r' encoding='UTF-8'> โ โ
tgi-service | โ โ is_local_model = False โ โ
tgi-service | โ โ json = <module 'json' from โ โ
tgi-service | โ โ '/opt/conda/lib/python3.10/json/__init__.py'> โ โ
tgi-service | โ โ json_output = True โ โ
tgi-service | โ โ local_pt_files = [ โ โ
tgi-service | โ โ โ โ โ
tgi-service | โ โ PosixPath('/data/models--Intel--neural-chat-7b-v3-3โฆ โ โ
tgi-service | โ โ โ โ โ
tgi-service | โ โ PosixPath('/data/models--Intel--neural-chat-7b-v3-3โฆ โ โ
tgi-service | โ โ ] โ โ
tgi-service | โ โ local_st_files = [ โ โ
tgi-service | โ โ โ โ โ
error from daemon in stream: Error grabbing logs: unexpected EOF
Hi @arun-gupta , the chatqna pipeline including tgi service can be started successfully on our xeon server.
The root cause of your issue is The cache for model files in Transformers v4.22.0. Please check the error messages below, remove the cache and try again.
2024-08-30T19:00:17.058243Z WARN text_generation_launcher: No safetensors weights found for model Intel/neural-chat-7b-v3-3 at revision None. Converting PyTorch weights to safetensors.
tgi-service | Error: DownloadError
tgi-service | 2024-08-30T19:01:00.144114Z ERROR download: text_generation_launcher: Download encountered an error:
tgi-service | The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling transformers.utils.move_cache()
.
@letonghan my steps are available at https://gist.github.com/arun-gupta/7e9f080feff664fbab878b26d13d83d7. I can only use the published Docker images. What should I do differently?
Tried with v0.8
of Docker images and got a similar error. Here are detailed logs:
ubuntu@ip-172-31-73-49:~$ sudo docker compose logs
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
chatqna-xeon-backend-server | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
chatqna-xeon-backend-server |
chatqna-xeon-backend-server | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
chatqna-xeon-backend-server | warnings.warn(
chatqna-xeon-backend-server | [2024-09-03 16:35:14,437] [ INFO] - Base service - CORS is enabled.
chatqna-xeon-backend-server | [2024-09-03 16:35:14,438] [ INFO] - Base service - Setting up HTTP server
chatqna-xeon-backend-server | [2024-09-03 16:35:14,438] [ INFO] - Base service - Uvicorn server setup on port 8888
chatqna-xeon-backend-server | INFO: Waiting for application startup.
chatqna-xeon-backend-server | INFO: Application startup complete.
chatqna-xeon-backend-server | INFO: Uvicorn running on http://0.0.0.0:8888 (Press CTRL+C to quit)
chatqna-xeon-backend-server | [2024-09-03 16:35:14,447] [ INFO] - Base service - HTTP server setup successful
tgi-service | 2024-09-03T16:35:12.583742Z INFO text_generation_launcher: Args {
tgi-service | model_id: "Intel/neural-chat-7b-v3-3",
tgi-service | revision: None,
tgi-service | validation_workers: 2,
tgi-service | sharded: None,
tgi-service | num_shard: None,
tgi-service | quantize: None,
tgi-service | speculate: None,
tgi-service | dtype: None,
tgi-service | trust_remote_code: false,
tgi-service | max_concurrent_requests: 128,
tgi-service | max_best_of: 2,
tgi-service | max_stop_sequences: 4,
tgi-service | max_top_n_tokens: 5,
tgi-service | max_input_tokens: None,
retriever-redis-server | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:184: UserWarning: Field name "downstream_black_list" shadows an attribute in parent "TopologyInfo";
retriever-redis-server | warnings.warn(
retriever-redis-server | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
retriever-redis-server |
tgi-service | max_input_length: None,
dataprep-redis-server | /home/user/.local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
tgi-service | max_total_tokens: None,
tgi-service | waiting_served_ratio: 0.3,
tgi-service | max_batch_prefill_tokens: None,
tgi-service | max_batch_total_tokens: None,
tgi-service | max_waiting_tokens: 20,
tgi-service | max_batch_size: None,
tgi-service | cuda_graphs: Some(
tgi-service | [
tgi-service | 0,
tgi-service | ],
tgi-service | ),
tgi-service | hostname: "1b133a5060d8",
tgi-service | port: 80,
tgi-service | shard_uds_path: "/tmp/text-generation-server",
retriever-redis-server | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
dataprep-redis-server |
retriever-redis-server | warnings.warn(
embedding-tei-server | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:184: UserWarning: Field name "downstream_black_list" shadows an attribute in parent "TopologyInfo";
redis-vector-db | 9:C 03 Sep 2024 16:35:12.563 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
redis-vector-db | 9:C 03 Sep 2024 16:35:12.563 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis-vector-db | 9:C 03 Sep 2024 16:35:12.563 * Redis version=7.2.4, bits=64, commit=00000000, modified=0, pid=9, just started
redis-vector-db | 9:C 03 Sep 2024 16:35:12.563 * Configuration loaded
redis-vector-db | 9:M 03 Sep 2024 16:35:12.564 * monotonic clock: POSIX clock_gettime
redis-vector-db | 9:M 03 Sep 2024 16:35:12.564 * Running mode=standalone, port=6379.
redis-vector-db | 9:M 03 Sep 2024 16:35:12.564 * Module 'RedisCompat' loaded from /opt/redis-stack/lib/rediscompat.so
redis-vector-db | 9:M 03 Sep 2024 16:35:12.565 * <search> Redis version found by RedisSearch : 7.2.4 - oss
redis-vector-db | 9:M 03 Sep 2024 16:35:12.565 * <search> RediSearch version 2.8.12 (Git=2.8-32fdaca)
chatqna-xeon-ui-server |
chatqna-xeon-ui-server | > sveltekit-auth-example@0.0.1 preview
chatqna-xeon-ui-server | > vite preview --port 5173 --host 0.0.0.0
redis-vector-db | 9:M 03 Sep 2024 16:35:12.565 * <search> Low level api version 1 initialized successfully
redis-vector-db | 9:M 03 Sep 2024 16:35:12.565 * <search> concurrent writes: OFF, gc: ON, prefix min length: 2, prefix max expansions: 200, query timeout (ms): 500, timeout policy: return, cursor read size: 1000, cursor max idle (ms): 300000, max doctable size: 1000000, max number of search results: 10000, search pool size: 20, index pool size: 8,
redis-vector-db | 9:M 03 Sep 2024 16:35:12.566 * <search> Initialized thread pools!
redis-vector-db | 9:M 03 Sep 2024 16:35:12.566 * <search> Enabled role change notification
tei-reranking-server | 2024-09-03T16:35:12.584082Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-********-*ase", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "429aafe43aba", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
llm-tgi-server | Defaulting to user installation because normal site-packages is not writeable
llm-tgi-server | Collecting langserve (from -r requirements-runtime.txt (line 1))
tei-reranking-server | 2024-09-03T16:35:12.584166Z INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
tei-reranking-server | 2024-09-03T16:35:12.644558Z INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
chatqna-xeon-ui-server |
chatqna-xeon-ui-server |
chatqna-xeon-ui-server | โ Local: http://localhost:5173/
chatqna-xeon-ui-server | โ Network: http://172.18.0.12:5173/
embedding-tei-server | warnings.warn(
embedding-tei-server | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
reranking-tei-xeon-server | /home/user/.local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
reranking-tei-xeon-server |
reranking-tei-xeon-server | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
reranking-tei-xeon-server | warnings.warn(
reranking-tei-xeon-server | [2024-09-03 16:35:16,214] [ INFO] - CORS is enabled.
reranking-tei-xeon-server | [2024-09-03 16:35:16,215] [ INFO] - Setting up HTTP server
reranking-tei-xeon-server | [2024-09-03 16:35:16,216] [ INFO] - Uvicorn server setup on port 8000
reranking-tei-xeon-server | INFO: Waiting for application startup.
embedding-tei-server |
llm-tgi-server | Downloading langserve-0.2.3-py3-none-any.whl.metadata (39 kB)
tei-reranking-server | 2024-09-03T16:35:12.838269Z INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
redis-vector-db | 9:M 03 Sep 2024 16:35:12.566 * Module 'search' loaded from /opt/redis-stack/lib/redisearch.so
embedding-tei-server | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
embedding-tei-server | warnings.warn(
embedding-tei-server | [2024-09-03 16:35:16,068] [ INFO] - CORS is enabled.
tgi-service | master_addr: "localhost",
embedding-tei-server | [2024-09-03 16:35:16,069] [ INFO] - Setting up HTTP server
embedding-tei-server | [2024-09-03 16:35:16,069] [ INFO] - Uvicorn server setup on port 6000
embedding-tei-server | INFO: Waiting for application startup.
llm-tgi-server | Requirement already satisfied: httpx>=0.23.0 in /home/user/.local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (0.27.0)
llm-tgi-server | Requirement already satisfied: langchain-core<0.3,>=0.1 in /usr/local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (0.1.7)
dataprep-redis-server | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
reranking-tei-xeon-server | INFO: Application startup complete.
dataprep-redis-server | warnings.warn(
reranking-tei-xeon-server | INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
reranking-tei-xeon-server | [2024-09-03 16:35:16,219] [ INFO] - HTTP server setup successful
reranking-tei-xeon-server | INFO: 172.31.73.49:47394 - "POST /v1/reranking HTTP/1.1" 200 OK
tgi-service | master_port: 29500,
embedding-tei-server | INFO: Application startup complete.
tgi-service | huggingface_hub_cache: Some(
embedding-tei-server | INFO: Uvicorn running on http://0.0.0.0:6000 (Press CTRL+C to quit)
tgi-service | "/data",
retriever-redis-server | [2024-09-03 16:35:16,011] [ INFO] - CORS is enabled.
embedding-tei-server | [2024-09-03 16:35:16,078] [ INFO] - HTTP server setup successful
retriever-redis-server | [2024-09-03 16:35:16,012] [ INFO] - Setting up HTTP server
redis-vector-db | 9:M 03 Sep 2024 16:35:12.567 * <timeseries> RedisTimeSeries version 11011, git_sha=0299ac12a6bf298028859c41ba0f4d8dc842726b
tgi-service | ),
redis-vector-db | 9:M 03 Sep 2024 16:35:12.567 * <timeseries> Redis version found by RedisTimeSeries : 7.2.4 - oss
retriever-redis-server | [2024-09-03 16:35:16,013] [ INFO] - Uvicorn server setup on port 7000
redis-vector-db | 9:M 03 Sep 2024 16:35:12.567 * <timeseries> loaded default CHUNK_SIZE_BYTES policy: 4096
retriever-redis-server | INFO: Waiting for application startup.
retriever-redis-server | INFO: Application startup complete.
retriever-redis-server | INFO: Uvicorn running on http://0.0.0.0:7000 (Press CTRL+C to quit)
retriever-redis-server | [2024-09-03 16:35:16,021] [ INFO] - HTTP server setup successful
retriever-redis-server | INFO: 172.31.73.49:50014 - "POST /v1/retrieval HTTP/1.1" 200 OK
llm-tgi-server | Requirement already satisfied: orjson>=2 in /home/user/.local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (3.10.7)
tei-embedding-server | 2024-09-03T16:35:12.584463Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-****-**-v1.5", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "b562c4d7638f", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
tei-embedding-server | 2024-09-03T16:35:12.584558Z INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
llm-tgi-server | Requirement already satisfied: pydantic>=1 in /usr/local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (2.5.3)
embedding-tei-server | TEI Gaudi Embedding initialized.
embedding-tei-server | INFO: 172.31.73.49:47900 - "POST /v1/embeddings HTTP/1.1" 200 OK
tgi-service | weights_cache_override: None,
tei-embedding-server | 2024-09-03T16:35:12.636839Z INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
tei-embedding-server | 2024-09-03T16:35:12.732054Z INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
tei-embedding-server | 2024-09-03T16:35:12.763631Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
tei-embedding-server | 2024-09-03T16:35:12.763643Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
tei-embedding-server | 2024-09-03T16:35:12.809072Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
tei-embedding-server | 2024-09-03T16:35:12.850503Z INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
tei-embedding-server | 2024-09-03T16:35:12.866721Z WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:317: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-base-en-v1.5/resolve/main/model.onnx)
tei-embedding-server | 2024-09-03T16:35:12.866734Z INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:318: Downloading `onnx/model.onnx`
tei-embedding-server | 2024-09-03T16:35:14.488973Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 1.725340147s
tei-embedding-server | 2024-09-03T16:35:14.500268Z INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
tei-embedding-server | 2024-09-03T16:35:14.500593Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 8 tokenization workers
tei-embedding-server | 2024-09-03T16:35:14.539989Z INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
tei-embedding-server | 2024-09-03T16:35:15.530685Z WARN text_embeddings_router: router/src/lib.rs:267: Backend does not support a batch size > 8
tgi-service | disable_custom_kernels: false,
tgi-service | cuda_memory_fraction: 1.0,
tgi-service | rope_scaling: None,
tgi-service | rope_factor: None,
tgi-service | json_output: false,
tgi-service | otlp_endpoint: None,
tgi-service | otlp_service_name: "text-generation-inference.router",
tgi-service | cors_allow_origin: [],
dataprep-redis-server | /home/user/.local/lib/python3.11/site-packages/langchain/__init__.py:30: UserWarning: Importing LLMChain from langchain root module is no longer supported. Please use langchain.chains.LLMChain instead.
llm-tgi-server | Collecting pyproject-toml<0.0.11,>=0.0.10 (from langserve->-r requirements-runtime.txt (line 1))
redis-vector-db | 9:M 03 Sep 2024 16:35:12.567 * <timeseries> loaded server DUPLICATE_POLICY: block
redis-vector-db | 9:M 03 Sep 2024 16:35:12.567 * <timeseries> Setting default series ENCODING to: compressed
tei-reranking-server | 2024-09-03T16:35:12.859642Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
tei-reranking-server | 2024-09-03T16:35:12.859654Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
tei-reranking-server | 2024-09-03T16:35:12.901154Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
tei-reranking-server | 2024-09-03T16:35:13.064387Z INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
tei-reranking-server | 2024-09-03T16:35:13.081150Z WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:317: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-base/resolve/main/model.onnx)
tei-reranking-server | 2024-09-03T16:35:13.081171Z INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:318: Downloading `onnx/model.onnx`
tei-reranking-server | 2024-09-03T16:35:16.162648Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 3.303005165s
tei-reranking-server | 2024-09-03T16:35:16.639814Z WARN text_embeddings_router: router/src/lib.rs:195: Could not find a Sentence Transformers config
redis-vector-db | 9:M 03 Sep 2024 16:35:12.567 * <timeseries> Detected redis oss
redis-vector-db | 9:M 03 Sep 2024 16:35:12.567 * Module 'timeseries' loaded from /opt/redis-stack/lib/redistimeseries.so
redis-vector-db | 9:M 03 Sep 2024 16:35:12.567 * <ReJSON> Created new data type 'ReJSON-RL'
dataprep-redis-server | warnings.warn(
tei-reranking-server | 2024-09-03T16:35:16.639830Z INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
tei-reranking-server | 2024-09-03T16:35:16.640052Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 8 tokenization workers
tei-reranking-server | 2024-09-03T16:35:18.474694Z INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
tei-reranking-server | 2024-09-03T16:35:20.263187Z WARN text_embeddings_router: router/src/lib.rs:267: Backend does not support a batch size > 8
tei-reranking-server | 2024-09-03T16:35:20.263205Z WARN text_embeddings_router: router/src/lib.rs:268: forcing `max_batch_requests=8`
redis-vector-db | 9:M 03 Sep 2024 16:35:12.567 * <ReJSON> version: 20609 git sha: unknown branch: unknown
tei-embedding-server | 2024-09-03T16:35:15.530759Z WARN text_embeddings_router: router/src/lib.rs:268: forcing `max_batch_requests=8`
tgi-service | api_key: None,
tgi-service | watermark_gamma: None,
tgi-service | watermark_delta: None,
tei-reranking-server | 2024-09-03T16:35:20.263314Z WARN text_embeddings_router: router/src/lib.rs:319: Invalid hostname, defaulting to 0.0.0.0
tei-reranking-server | 2024-09-03T16:35:20.264715Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1778: Starting HTTP server: 0.0.0.0:80
tei-reranking-server | 2024-09-03T16:35:20.264722Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1779: Ready
tei-reranking-server | 2024-09-03T16:41:33.132290Z INFO rerank{total_time="20.070554ms" tokenization_time="576.473ยตs" queue_time="891.339ยตs" inference_time="18.483195ms"}: text_embeddings_router::http::server: router/src/http/server.rs:455: Success
tei-reranking-server | 2024-09-03T16:41:45.597461Z INFO rerank{total_time="25.077603ms" tokenization_time="288.427ยตs" queue_time="6.776727ms" inference_time="12.031357ms"}: text_embeddings_router::http::server: router/src/http/server.rs:455: Success
redis-vector-db | 9:M 03 Sep 2024 16:35:12.567 * <ReJSON> Exported RedisJSON_V1 API
redis-vector-db | 9:M 03 Sep 2024 16:35:12.568 * <ReJSON> Exported RedisJSON_V2 API
redis-vector-db | 9:M 03 Sep 2024 16:35:12.568 * <ReJSON> Exported RedisJSON_V3 API
redis-vector-db | 9:M 03 Sep 2024 16:35:12.568 * <ReJSON> Exported RedisJSON_V4 API
redis-vector-db | 9:M 03 Sep 2024 16:35:12.568 * <ReJSON> Exported RedisJSON_V5 API
redis-vector-db | 9:M 03 Sep 2024 16:35:12.568 * <ReJSON> Enabled diskless replication
redis-vector-db | 9:M 03 Sep 2024 16:35:12.568 * Module 'ReJSON' loaded from /opt/redis-stack/lib/rejson.so
dataprep-redis-server | /home/user/.local/lib/python3.11/site-packages/langchain/__init__.py:30: UserWarning: Importing PromptTemplate from langchain root module is no longer supported. Please use langchain_core.prompts.PromptTemplate instead.
dataprep-redis-server | warnings.warn(
dataprep-redis-server | [2024-09-03 16:35:17,740] [ INFO] - CORS is enabled.
llm-tgi-server | Downloading pyproject_toml-0.0.10-py3-none-any.whl.metadata (642 bytes)
llm-tgi-server | Requirement already satisfied: anyio in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (4.2.0)
dataprep-redis-server | [2024-09-03 16:35:17,741] [ INFO] - Setting up HTTP server
dataprep-redis-server | [2024-09-03 16:35:17,742] [ INFO] - Uvicorn server setup on port 6007
dataprep-redis-server | INFO: Waiting for application startup.
dataprep-redis-server | INFO: Application startup complete.
dataprep-redis-server | INFO: Uvicorn running on http://0.0.0.0:6007 (Press CTRL+C to quit)
dataprep-redis-server | [2024-09-03 16:35:17,744] [ INFO] - HTTP server setup successful
dataprep-redis-server | [2024-09-03 16:35:17,751] [ INFO] - CORS is enabled.
dataprep-redis-server | [2024-09-03 16:35:17,751] [ INFO] - CORS is enabled.
dataprep-redis-server | [2024-09-03 16:35:17,752] [ INFO] - Setting up HTTP server
dataprep-redis-server | [2024-09-03 16:35:17,752] [ INFO] - Setting up HTTP server
dataprep-redis-server | [2024-09-03 16:35:17,752] [ INFO] - Uvicorn server setup on port 6008
dataprep-redis-server | [2024-09-03 16:35:17,752] [ INFO] - Uvicorn server setup on port 6008
dataprep-redis-server | INFO: Waiting for application startup.
dataprep-redis-server | INFO: Application startup complete.
llm-tgi-server | Requirement already satisfied: certifi in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (2023.11.17)
llm-tgi-server | Requirement already satisfied: httpcore==1.* in /home/user/.local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (1.0.5)
llm-tgi-server | Requirement already satisfied: idna in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (3.6)
llm-tgi-server | Requirement already satisfied: sniffio in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (1.3.0)
llm-tgi-server | Requirement already satisfied: h11<0.15,>=0.13 in /home/user/.local/lib/python3.11/site-packages (from httpcore==1.*->httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (0.14.0)
redis-vector-db | 9:M 03 Sep 2024 16:35:12.568 * <search> Acquired RedisJSON_V5 API
tei-embedding-server | 2024-09-03T16:35:15.530888Z WARN text_embeddings_router: router/src/lib.rs:319: Invalid hostname, defaulting to 0.0.0.0
tei-embedding-server | 2024-09-03T16:35:15.532414Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1778: Starting HTTP server: 0.0.0.0:80
tei-embedding-server | 2024-09-03T16:35:15.532453Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1779: Ready
tei-embedding-server | 2024-09-03T16:39:35.319143Z INFO embed{total_time="12.188539ms" tokenization_time="720.001ยตs" queue_time="640.359ยตs" inference_time="10.74443ms"}: text_embeddings_router::http::server: router/src/http/server.rs:706: Success
tei-embedding-server | 2024-09-03T16:41:11.040334Z INFO embed{total_time="9.505833ms" tokenization_time="401.51ยตs" queue_time="499.94ยตs" inference_time="8.531757ms"}: text_embeddings_router::http::server: router/src/http/server.rs:706: Success
tgi-service | ngrok: false,
tgi-service | ngrok_authtoken: None,
llm-tgi-server | Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (6.0.1)
llm-tgi-server | Requirement already satisfied: jsonpatch<2.0,>=1.33 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (1.33)
llm-tgi-server | Requirement already satisfied: langsmith<0.1.0,>=0.0.63 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (0.0.77)
llm-tgi-server | Requirement already satisfied: packaging<24.0,>=23.2 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (23.2)
tgi-service | ngrok_edge: None,
tgi-service | tokenizer_config_path: None,
tgi-service | disable_grammar_support: false,
tgi-service | env: false,
tgi-service | max_client_batch_size: 4,
tgi-service | lora_adapters: None,
tgi-service | usage_stats: On,
tgi-service | }
tgi-service | 2024-09-03T16:35:12.583910Z INFO hf_hub: Token file not found "/root/.cache/huggingface/token"
llm-tgi-server | Requirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (2.31.0)
redis-vector-db | 9:M 03 Sep 2024 16:35:12.568 * <bf> RedisBloom version 2.6.12 (Git=unknown)
redis-vector-db | 9:M 03 Sep 2024 16:35:12.568 * Module 'bf' loaded from /opt/redis-stack/lib/redisbloom.so
redis-vector-db | 9:M 03 Sep 2024 16:35:12.568 * <redisgears_2> Created new data type 'GearsType'
redis-vector-db | 9:M 03 Sep 2024 16:35:12.568 * <redisgears_2> Detected redis oss
redis-vector-db | 9:M 03 Sep 2024 16:35:12.568 # <redisgears_2> could not initialize RedisAI_InitError
redis-vector-db |
redis-vector-db | 9:M 03 Sep 2024 16:35:12.568 * <redisgears_2> Failed loading RedisAI API.
redis-vector-db | 9:M 03 Sep 2024 16:35:12.568 * <redisgears_2> RedisGears v2.0.19, sha='671030bbcb7de4582d00575a0902f826da3efe73', build_type='release', built_for='Linux-ubuntu22.04.x86_64'.
redis-vector-db | 9:M 03 Sep 2024 16:35:12.569 * <redisgears_2> Registered backend: js.
redis-vector-db | 9:M 03 Sep 2024 16:35:12.569 * Module 'redisgears_2' loaded from /opt/redis-stack/lib/redisgears.so
redis-vector-db | 9:M 03 Sep 2024 16:35:12.569 * Server initialized
redis-vector-db | 9:M 03 Sep 2024 16:35:12.570 * Ready to accept connections tcp
tgi-service | 2024-09-03T16:35:12.637897Z INFO text_generation_launcher: Model supports up to 32768 but tgi will now set its default to 4096 instead. This is to save VRAM by refusing large prompts in order to allow more users on the same hardware. You can increase that size using `--max-batch-prefill-tokens=32818 --max-total-tokens=32768 --max-input-tokens=32767`.
dataprep-redis-server | INFO: Uvicorn running on http://0.0.0.0:6008 (Press CTRL+C to quit)
llm-tgi-server | Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (8.2.3)
tgi-service | 2024-09-03T16:35:12.637917Z INFO text_generation_launcher: Default `max_input_tokens` to 4095
llm-tgi-server | Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.11/site-packages (from pydantic>=1->langserve->-r requirements-runtime.txt (line 1)) (0.6.0)
tgi-service | 2024-09-03T16:35:12.637920Z INFO text_generation_launcher: Default `max_total_tokens` to 4096
tgi-service | 2024-09-03T16:35:12.637922Z INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4145
tgi-service | 2024-09-03T16:35:12.638062Z INFO download: text_generation_launcher: Starting check and download process for Intel/neural-chat-7b-v3-3
tgi-service | 2024-09-03T16:35:16.967885Z WARN text_generation_launcher: No safetensors weights found for model Intel/neural-chat-7b-v3-3 at revision None. Downloading PyTorch weights.
tgi-service | 2024-09-03T16:35:16.999570Z INFO text_generation_launcher: Download file: pytorch_model-00001-of-00002.bin
tgi-service | 2024-09-03T16:35:44.225639Z INFO text_generation_launcher: Downloaded /data/models--Intel--neural-chat-7b-v3-3/snapshots/bdd31cf498d13782cc7497cba5896996ce429f91/pytorch_model-00001-of-00002.bin in 0:00:27.
tgi-service | 2024-09-03T16:35:44.225659Z INFO text_generation_launcher: Download: [1/2] -- ETA: 0:00:27
tgi-service | 2024-09-03T16:35:44.225982Z INFO text_generation_launcher: Download file: pytorch_model-00002-of-00002.bin
tgi-service | 2024-09-03T16:36:10.045527Z INFO text_generation_launcher: Downloaded /data/models--Intel--neural-chat-7b-v3-3/snapshots/bdd31cf498d13782cc7497cba5896996ce429f91/pytorch_model-00002-of-00002.bin in 0:00:25.
tgi-service | 2024-09-03T16:36:10.045548Z INFO text_generation_launcher: Download: [2/2] -- ETA: 0
tgi-service | 2024-09-03T16:36:10.045563Z WARN text_generation_launcher: ๐จ๐จBREAKING CHANGE in 2.0๐จ๐จ: Safetensors conversion is disabled without `--trust-remote-code` because Pickle files are unsafe and can essentially contain remote code execution!Please check for more information here: https://huggingface.co/docs/text-generation-inference/basic_tutorials/safety
tgi-service | 2024-09-03T16:36:10.045793Z WARN text_generation_launcher: No safetensors weights found for model Intel/neural-chat-7b-v3-3 at revision None. Converting PyTorch weights to safetensors.
tgi-service | Error: DownloadError
tgi-service | 2024-09-03T16:37:00.978463Z ERROR download: text_generation_launcher: Download encountered an error:
tgi-service | The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
tgi-service | 2024-09-03 16:35:16.253 | INFO | text_generation_server.utils.import_utils:<module>:75 - Detected system ipex
tgi-service | /opt/conda/lib/python3.10/site-packages/text_generation_server/utils/sgmv.py:18: UserWarning: Could not import SGMV kernel from Punica, falling back to loop.
tgi-service | warnings.warn("Could not import SGMV kernel from Punica, falling back to loop.")
tgi-service | โญโโโโโโโโโโโโโโโโโโโโโ Traceback (most recent call last) โโโโโโโโโโโโโโโโโโโโโโโฎ
tgi-service | โ /opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py:324 in โ
tgi-service | โ download_weights โ
tgi-service | โ โ
tgi-service | โ 321 โ โ except Exception: โ
tgi-service | โ 322 โ โ โ discard_names = [] โ
tgi-service | โ 323 โ โ # Convert pytorch weights to safetensors โ
llm-tgi-server | Requirement already satisfied: pydantic-core==2.14.6 in /usr/local/lib/python3.11/site-packages (from pydantic>=1->langserve->-r requirements-runtime.txt (line 1)) (2.14.6)
llm-tgi-server | Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.11/site-packages (from pydantic>=1->langserve->-r requirements-runtime.txt (line 1)) (4.9.0)
llm-tgi-server | Requirement already satisfied: setuptools>=42 in /usr/local/lib/python3.11/site-packages (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (65.5.1)
llm-tgi-server | Requirement already satisfied: wheel in /usr/local/lib/python3.11/site-packages (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (0.42.0)
llm-tgi-server | Collecting toml (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1))
llm-tgi-server | Downloading toml-0.10.2-py2.py3-none-any.whl.metadata (7.1 kB)
llm-tgi-server | Requirement already satisfied: jsonschema in /home/user/.local/lib/python3.11/site-packages (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (4.23.0)
llm-tgi-server | Requirement already satisfied: jsonpointer>=1.9 in /usr/local/lib/python3.11/site-packages (from jsonpatch<2.0,>=1.33->langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (2.4)
llm-tgi-server | Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/site-packages (from requests<3,>=2->langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (3.3.2)
llm-tgi-server | Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/site-packages (from requests<3,>=2->langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (2.1.0)
llm-tgi-server | Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (23.2.0)
llm-tgi-server | Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /home/user/.local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (2023.12.1)
llm-tgi-server | Requirement already satisfied: referencing>=0.28.4 in /home/user/.local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (0.35.1)
llm-tgi-server | Requirement already satisfied: rpds-py>=0.7.1 in /home/user/.local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (0.20.0)
llm-tgi-server | Downloading langserve-0.2.3-py3-none-any.whl (1.2 MB)
llm-tgi-server | โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 1.2/1.2 MB 64.8 MB/s eta 0:00:00
llm-tgi-server | Downloading pyproject_toml-0.0.10-py3-none-any.whl (6.9 kB)
llm-tgi-server | Downloading toml-0.10.2-py2.py3-none-any.whl (16 kB)
llm-tgi-server | Installing collected packages: toml, pyproject-toml, langserve
llm-tgi-server | Successfully installed langserve-0.2.3 pyproject-toml-0.0.10 toml-0.10.2
llm-tgi-server | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:184: UserWarning: Field name "downstream_black_list" shadows an attribute in parent "TopologyInfo";
llm-tgi-server | warnings.warn(
llm-tgi-server | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
llm-tgi-server |
llm-tgi-server | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
dataprep-redis-server | [2024-09-03 16:35:17,753] [ INFO] - HTTP server setup successful
dataprep-redis-server | [2024-09-03 16:35:17,753] [ INFO] - HTTP server setup successful
dataprep-redis-server | [2024-09-03 16:35:17,753] [ INFO] - CORS is enabled.
dataprep-redis-server | [2024-09-03 16:35:17,753] [ INFO] - CORS is enabled.
dataprep-redis-server | [2024-09-03 16:35:17,753] [ INFO] - CORS is enabled.
dataprep-redis-server | [2024-09-03 16:35:17,754] [ INFO] - Setting up HTTP server
dataprep-redis-server | [2024-09-03 16:35:17,754] [ INFO] - Setting up HTTP server
dataprep-redis-server | [2024-09-03 16:35:17,754] [ INFO] - Setting up HTTP server
dataprep-redis-server | [2024-09-03 16:35:17,754] [ INFO] - Uvicorn server setup on port 6009
dataprep-redis-server | [2024-09-03 16:35:17,754] [ INFO] - Uvicorn server setup on port 6009
dataprep-redis-server | [2024-09-03 16:35:17,754] [ INFO] - Uvicorn server setup on port 6009
dataprep-redis-server | INFO: Waiting for application startup.
dataprep-redis-server | INFO: Application startup complete.
dataprep-redis-server | INFO: Uvicorn running on http://0.0.0.0:6009 (Press CTRL+C to quit)
dataprep-redis-server | [2024-09-03 16:35:17,755] [ INFO] - HTTP server setup successful
tgi-service | โ โฑ 324 โ โ utils.convert_files(local_pt_files, local_st_files, discard_na โ
tgi-service | โ 325 โ
tgi-service | โ 326 โ
tgi-service | โ 327 @app.command() โ
tgi-service | โ โ
tgi-service | โ โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ locals โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ โ
tgi-service | โ โ architecture = 'MistralForCausalLM' โ โ
tgi-service | โ โ auto_convert = True โ โ
tgi-service | โ โ base_model_id = None โ โ
tgi-service | โ โ class_ = <class โ โ
tgi-service | โ โ 'transformers.models.mistral.modeling_mistral.Mistrโฆ โ โ
tgi-service | โ โ config = { โ โ
tgi-service | โ โ โ '_name_or_path': './neural-chat-7b-v3-9', โ โ
tgi-service | โ โ โ 'architectures': ['MistralForCausalLM'], โ โ
tgi-service | โ โ โ 'bos_token_id': 1, โ โ
tgi-service | โ โ โ 'eos_token_id': 2, โ โ
tgi-service | โ โ โ 'hidden_act': 'silu', โ โ
llm-tgi-server | warnings.warn(
llm-tgi-server | [2024-09-03 16:35:15,842] [ INFO] - CORS is enabled.
llm-tgi-server | [2024-09-03 16:35:15,842] [ INFO] - Setting up HTTP server
llm-tgi-server | [2024-09-03 16:35:15,843] [ INFO] - Uvicorn server setup on port 9000
llm-tgi-server | INFO: Waiting for application startup.
llm-tgi-server | INFO: Application startup complete.
llm-tgi-server | INFO: Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit)
llm-tgi-server | [2024-09-03 16:35:15,852] [ INFO] - HTTP server setup successful
dataprep-redis-server | [2024-09-03 16:35:17,755] [ INFO] - HTTP server setup successful
dataprep-redis-server | [2024-09-03 16:35:17,755] [ INFO] - HTTP server setup successful
tgi-service | โ โ โ 'hidden_size': 4096, โ โ
tgi-service | โ โ โ 'initializer_range': 0.02, โ โ
tgi-service | โ โ โ 'intermediate_size': 14336, โ โ
tgi-service | โ โ โ 'max_position_embeddings': 32768, โ โ
tgi-service | โ โ โ 'model_type': 'mistral', โ โ
tgi-service | โ โ โ ... +11 โ โ
tgi-service | โ โ } โ โ
tgi-service | โ โ config_filename = '/data/models--Intel--neural-chat-7b-v3-3/snapshotsโฆ โ โ
tgi-service | โ โ discard_names = ['lm_head.weight'] โ โ
tgi-service | โ โ extension = '.safetensors' โ โ
tgi-service | โ โ f = <_io.TextIOWrapper โ โ
tgi-service | โ โ name='/data/models--Intel--neural-chat-7b-v3-3/snapโฆ โ โ
tgi-service | โ โ mode='r' encoding='UTF-8'> โ โ
tgi-service | โ โ is_local_model = False โ โ
tgi-service | โ โ json = <module 'json' from โ โ
tgi-service | โ โ '/opt/conda/lib/python3.10/json/__init__.py'> โ โ
tgi-service | โ โ json_output = True โ โ
tgi-service | โ โ local_pt_files = [ โ โ
tgi-service | โ โ โ โ โ
tgi-service | โ โ PosixPath('/data/models--Intel--neural-chat-7b-v3-3โฆ โ โ
tgi-service | โ โ โ โ โ
error from daemon in stream: Error grabbing logs: unexpected EOF
This error is only occurring with Ubuntu 24.04 on AWS. I tested with both 0.8
and 0.9
Docker images. It worked fine on Amazon Linux 2023 AMI.
Hi @arun-gupta , since we don't have an AWS environment, currently it's hard to find out the root cause of this issue. If you're willing to share your aws environment, we can help debug the problem!
@letonghan sure, let me set up a time with you offline.
I tried this again on AWS Ubuntu 24.04 and it is working fine. It also worked with Ubuntu 24.04 on GCP with latest
images. The GCP instructions are available at https://gist.github.com/arun-gupta/564c5334c62cf4ada3cbd3124a2defb7.
The bug can be closed.
Ok, will close this issue.
Priority
Undecided
OS type
Ubuntu
Hardware type
Xeon-SPR
Installation method
Deploy method
Running nodes
Single Node
What's the version?
0.9
Description
The instructions at https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker/xeon needs a better user experience.
Testing the LLM service says:
The container has been running for four hours now and still connecting to the service gives the following error:
There should be a clear indication of how the developer would know the download is finished. Also, the container name is
tgi-service
so that should be specified.Reproduce steps
The steps are documented at https://gist.github.com/arun-gupta/7e9f080feff664fbab878b26d13d83d7
Raw log