Open grafail opened 1 month ago
Using this version of the image: https://github.com/huggingface/text-generation-inference/pkgs/container/text-generation-inference/222969763?tag=latest-intel GPUs Used:
+-----------+--------------------------------------------------------------------------------------+ | Device ID | Device Information | +-----------+--------------------------------------------------------------------------------------+ | 0 | Device Name: Intel(R) Data Center GPU Max 1100 | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-000f-0000-002f0bda8086 | | | PCI BDF Address: 0000:0f:00.0 | | | DRM Device: /dev/dri/card1 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 1 | Device Name: Intel(R) Data Center GPU Max 1100 | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0016-0000-002f0bda8086 | | | PCI BDF Address: 0000:16:00.0 | | | DRM Device: /dev/dri/card2 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 2 | Device Name: Intel(R) Data Center GPU Max 1100 | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-001a-0000-002f0bda8086 | | | PCI BDF Address: 0000:1a:00.0 | | | DRM Device: /dev/dri/card3 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 3 | Device Name: Intel(R) Data Center GPU Max 1100 | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-001e-0000-002f0bda8086 | | | PCI BDF Address: 0000:1e:00.0 | | | DRM Device: /dev/dri/card4 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 4 | Device Name: Intel(R) Data Center GPU Max 1100 | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-008a-0000-002f0bda8086 | | | PCI BDF Address: 0000:8a:00.0 | | | DRM Device: /dev/dri/card5 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 5 | Device Name: Intel(R) Data Center GPU Max 1100 | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-008e-0000-002f0bda8086 | | | PCI BDF Address: 0000:8e:00.0 | | | DRM Device: /dev/dri/card6 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 6 | Device Name: Intel(R) Data Center GPU Max 1100 | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-00c0-0000-002f0bda8086 | | | PCI BDF Address: 0000:c0:00.0 | | | DRM Device: /dev/dri/card7 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 7 | Device Name: Intel(R) Data Center GPU Max 1100 | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-00c4-0000-002f0bda8086 | | | PCI BDF Address: 0000:c4:00.0 | | | DRM Device: /dev/dri/card8 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+
Deployment is inside the Intel Developer Cloud
Docker compose file used:
version: "3.9" services: llama3_8b: image: ghcr.io/huggingface/text-generation-inference:latest-intel command: --model-id meta-llama/Meta-Llama-3-8B-Instruct --sharded false --max-total-tokens 8000 environment: - HF_TOKEN=<token> shm_size: 1g ports: - "8083:80" volumes: - ./hf-data:/data devices: - /dev/dri:/dev/dri privileged: true
Output:
mistral_7b-1 | 2024-05-30T15:29:08.302655Z INFO text_generation_launcher: Args { mistral_7b-1 | model_id: "meta-llama/Meta-Llama-3-8B-Instruct", mistral_7b-1 | revision: None, mistral_7b-1 | validation_workers: 2, mistral_7b-1 | sharded: Some( mistral_7b-1 | false, mistral_7b-1 | ), mistral_7b-1 | num_shard: None, mistral_7b-1 | quantize: None, mistral_7b-1 | speculate: None, mistral_7b-1 | dtype: None, mistral_7b-1 | trust_remote_code: false, mistral_7b-1 | max_concurrent_requests: 128, mistral_7b-1 | max_best_of: 2, mistral_7b-1 | max_stop_sequences: 4, mistral_7b-1 | max_top_n_tokens: 5, mistral_7b-1 | max_input_tokens: None, mistral_7b-1 | max_input_length: None, mistral_7b-1 | max_total_tokens: Some( mistral_7b-1 | 8000, mistral_7b-1 | ), mistral_7b-1 | waiting_served_ratio: 0.3, mistral_7b-1 | max_batch_prefill_tokens: None, mistral_7b-1 | max_batch_total_tokens: None, mistral_7b-1 | max_waiting_tokens: 20, mistral_7b-1 | max_batch_size: None, mistral_7b-1 | cuda_graphs: None, mistral_7b-1 | hostname: "17c9ca27bb21", mistral_7b-1 | port: 80, mistral_7b-1 | shard_uds_path: "/tmp/text-generation-server", mistral_7b-1 | master_addr: "localhost", mistral_7b-1 | master_port: 29500, mistral_7b-1 | huggingface_hub_cache: Some( mistral_7b-1 | "/data", mistral_7b-1 | ), mistral_7b-1 | weights_cache_override: None, mistral_7b-1 | disable_custom_kernels: false, mistral_7b-1 | cuda_memory_fraction: 1.0, mistral_7b-1 | rope_scaling: None, mistral_7b-1 | rope_factor: None, mistral_7b-1 | json_output: false, mistral_7b-1 | otlp_endpoint: None, mistral_7b-1 | cors_allow_origin: [], mistral_7b-1 | watermark_gamma: None, mistral_7b-1 | watermark_delta: None, mistral_7b-1 | ngrok: false, mistral_7b-1 | ngrok_authtoken: None, mistral_7b-1 | ngrok_edge: None, mistral_7b-1 | tokenizer_config_path: None, mistral_7b-1 | disable_grammar_support: false, mistral_7b-1 | env: false, mistral_7b-1 | max_client_batch_size: 4, mistral_7b-1 | } mistral_7b-1 | 2024-05-30T15:29:08.302905Z INFO hf_hub: Token file not found "/root/.cache/huggingface/token" mistral_7b-1 | 2024-05-30T15:29:08.395883Z INFO text_generation_launcher: Default `max_input_tokens` to 4095 mistral_7b-1 | 2024-05-30T15:29:08.395923Z INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4145 mistral_7b-1 | 2024-05-30T15:29:08.395931Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32] mistral_7b-1 | 2024-05-30T15:29:08.396191Z INFO download: text_generation_launcher: Starting download process. mistral_7b-1 | 2024-05-30T15:29:11.400965Z ERROR download: text_generation_launcher: Download encountered an error: mistral_7b-1 | /usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 'libpng16.so.16: cannot open shared object file: No such file or directory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? mistral_7b-1 | warn( mistral_7b-1 | Traceback (most recent call last): mistral_7b-1 | mistral_7b-1 | File "/usr/local/bin/text-generation-server", line 8, in <module> mistral_7b-1 | sys.exit(app()) mistral_7b-1 | mistral_7b-1 | Error: DownloadError mistral_7b-1 | File "/usr/local/lib/python3.10/dist-packages/text_generation_server/cli.py", line 126, in download_weights mistral_7b-1 | from text_generation_server import utils mistral_7b-1 | mistral_7b-1 | File "/usr/local/lib/python3.10/dist-packages/text_generation_server/utils/__init__.py", line 3, in <module> mistral_7b-1 | from text_generation_server.utils.weights import Weights mistral_7b-1 | mistral_7b-1 | File "/usr/local/lib/python3.10/dist-packages/text_generation_server/utils/weights.py", line 10, in <module> mistral_7b-1 | from text_generation_server.layers.exl2 import Exl2Weight mistral_7b-1 | mistral_7b-1 | File "/usr/local/lib/python3.10/dist-packages/text_generation_server/layers/__init__.py", line 1, in <module> mistral_7b-1 | from text_generation_server.layers.tensor_parallel import ( mistral_7b-1 | mistral_7b-1 | File "/usr/local/lib/python3.10/dist-packages/text_generation_server/layers/tensor_parallel.py", line 4, in <module> mistral_7b-1 | from text_generation_server.layers.linear import get_linear, FastLinear mistral_7b-1 | mistral_7b-1 | File "/usr/local/lib/python3.10/dist-packages/text_generation_server/layers/linear.py", line 6, in <module> mistral_7b-1 | from text_generation_server.layers.gptq import GPTQWeight mistral_7b-1 | mistral_7b-1 | File "/usr/local/lib/python3.10/dist-packages/text_generation_server/layers/gptq/__init__.py", line 61, in <module> mistral_7b-1 | from text_generation_server.layers.gptq.quant_linear import QuantLinear mistral_7b-1 | mistral_7b-1 | File "/usr/local/lib/python3.10/dist-packages/text_generation_server/layers/gptq/quant_linear.py", line 7, in <module> mistral_7b-1 | import triton mistral_7b-1 | mistral_7b-1 | ModuleNotFoundError: No module named 'triton' mistral_7b-1 | mistral_7b-1 exited with code 1
Image should start correctly. Most probably Triton should not be imported when using Intel XPUs
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
System Info
Using this version of the image: https://github.com/huggingface/text-generation-inference/pkgs/container/text-generation-inference/222969763?tag=latest-intel GPUs Used:
Deployment is inside the Intel Developer Cloud
Information
Tasks
Reproduction
Docker compose file used:
Output:
Expected behavior
Image should start correctly. Most probably Triton should not be imported when using Intel XPUs