huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.34k stars 941 forks source link

Intel XPU Docker image import error on start #1983

Open grafail opened 1 month ago

grafail commented 1 month ago

System Info

Using this version of the image: https://github.com/huggingface/text-generation-inference/pkgs/container/text-generation-inference/222969763?tag=latest-intel GPUs Used:

+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information                                                                   |
+-----------+--------------------------------------------------------------------------------------+
| 0         | Device Name: Intel(R) Data Center GPU Max 1100                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-000f-0000-002f0bda8086                                       |
|           | PCI BDF Address: 0000:0f:00.0                                                        |
|           | DRM Device: /dev/dri/card1                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
| 1         | Device Name: Intel(R) Data Center GPU Max 1100                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-0016-0000-002f0bda8086                                       |
|           | PCI BDF Address: 0000:16:00.0                                                        |
|           | DRM Device: /dev/dri/card2                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
| 2         | Device Name: Intel(R) Data Center GPU Max 1100                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-001a-0000-002f0bda8086                                       |
|           | PCI BDF Address: 0000:1a:00.0                                                        |
|           | DRM Device: /dev/dri/card3                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
| 3         | Device Name: Intel(R) Data Center GPU Max 1100                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-001e-0000-002f0bda8086                                       |
|           | PCI BDF Address: 0000:1e:00.0                                                        |
|           | DRM Device: /dev/dri/card4                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
| 4         | Device Name: Intel(R) Data Center GPU Max 1100                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-008a-0000-002f0bda8086                                       |
|           | PCI BDF Address: 0000:8a:00.0                                                        |
|           | DRM Device: /dev/dri/card5                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
| 5         | Device Name: Intel(R) Data Center GPU Max 1100                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-008e-0000-002f0bda8086                                       |
|           | PCI BDF Address: 0000:8e:00.0                                                        |
|           | DRM Device: /dev/dri/card6                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
| 6         | Device Name: Intel(R) Data Center GPU Max 1100                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-00c0-0000-002f0bda8086                                       |
|           | PCI BDF Address: 0000:c0:00.0                                                        |
|           | DRM Device: /dev/dri/card7                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
| 7         | Device Name: Intel(R) Data Center GPU Max 1100                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-00c4-0000-002f0bda8086                                       |
|           | PCI BDF Address: 0000:c4:00.0                                                        |
|           | DRM Device: /dev/dri/card8                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+

Deployment is inside the Intel Developer Cloud

Information

Tasks

Reproduction

Docker compose file used:

version: "3.9"
services:
  llama3_8b:
      image: ghcr.io/huggingface/text-generation-inference:latest-intel
      command: --model-id meta-llama/Meta-Llama-3-8B-Instruct --sharded false  --max-total-tokens 8000
      environment:
        - HF_TOKEN=<token>
      shm_size: 1g
      ports:
        - "8083:80"
      volumes:
        - ./hf-data:/data
      devices:
        - /dev/dri:/dev/dri 
      privileged: true

Output:

mistral_7b-1  | 2024-05-30T15:29:08.302655Z  INFO text_generation_launcher: Args {
mistral_7b-1  |     model_id: "meta-llama/Meta-Llama-3-8B-Instruct",
mistral_7b-1  |     revision: None,
mistral_7b-1  |     validation_workers: 2,
mistral_7b-1  |     sharded: Some(
mistral_7b-1  |         false,
mistral_7b-1  |     ),
mistral_7b-1  |     num_shard: None,
mistral_7b-1  |     quantize: None,
mistral_7b-1  |     speculate: None,
mistral_7b-1  |     dtype: None,
mistral_7b-1  |     trust_remote_code: false,
mistral_7b-1  |     max_concurrent_requests: 128,
mistral_7b-1  |     max_best_of: 2,
mistral_7b-1  |     max_stop_sequences: 4,
mistral_7b-1  |     max_top_n_tokens: 5,
mistral_7b-1  |     max_input_tokens: None,
mistral_7b-1  |     max_input_length: None,
mistral_7b-1  |     max_total_tokens: Some(
mistral_7b-1  |         8000,
mistral_7b-1  |     ),
mistral_7b-1  |     waiting_served_ratio: 0.3,
mistral_7b-1  |     max_batch_prefill_tokens: None,
mistral_7b-1  |     max_batch_total_tokens: None,
mistral_7b-1  |     max_waiting_tokens: 20,
mistral_7b-1  |     max_batch_size: None,
mistral_7b-1  |     cuda_graphs: None,
mistral_7b-1  |     hostname: "17c9ca27bb21",
mistral_7b-1  |     port: 80,
mistral_7b-1  |     shard_uds_path: "/tmp/text-generation-server",
mistral_7b-1  |     master_addr: "localhost",
mistral_7b-1  |     master_port: 29500,
mistral_7b-1  |     huggingface_hub_cache: Some(
mistral_7b-1  |         "/data",
mistral_7b-1  |     ),
mistral_7b-1  |     weights_cache_override: None,
mistral_7b-1  |     disable_custom_kernels: false,
mistral_7b-1  |     cuda_memory_fraction: 1.0,
mistral_7b-1  |     rope_scaling: None,
mistral_7b-1  |     rope_factor: None,
mistral_7b-1  |     json_output: false,
mistral_7b-1  |     otlp_endpoint: None,
mistral_7b-1  |     cors_allow_origin: [],
mistral_7b-1  |     watermark_gamma: None,
mistral_7b-1  |     watermark_delta: None,
mistral_7b-1  |     ngrok: false,
mistral_7b-1  |     ngrok_authtoken: None,
mistral_7b-1  |     ngrok_edge: None,
mistral_7b-1  |     tokenizer_config_path: None,
mistral_7b-1  |     disable_grammar_support: false,
mistral_7b-1  |     env: false,
mistral_7b-1  |     max_client_batch_size: 4,
mistral_7b-1  | }
mistral_7b-1  | 2024-05-30T15:29:08.302905Z  INFO hf_hub: Token file not found "/root/.cache/huggingface/token"    
mistral_7b-1  | 2024-05-30T15:29:08.395883Z  INFO text_generation_launcher: Default `max_input_tokens` to 4095
mistral_7b-1  | 2024-05-30T15:29:08.395923Z  INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4145
mistral_7b-1  | 2024-05-30T15:29:08.395931Z  INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
mistral_7b-1  | 2024-05-30T15:29:08.396191Z  INFO download: text_generation_launcher: Starting download process.
mistral_7b-1  | 2024-05-30T15:29:11.400965Z ERROR download: text_generation_launcher: Download encountered an error: 
mistral_7b-1  | /usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 'libpng16.so.16: cannot open shared object file: No such file or directory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
mistral_7b-1  |   warn(
mistral_7b-1  | Traceback (most recent call last):
mistral_7b-1  | 
mistral_7b-1  |   File "/usr/local/bin/text-generation-server", line 8, in <module>
mistral_7b-1  |     sys.exit(app())
mistral_7b-1  | 
mistral_7b-1  | Error: DownloadError
mistral_7b-1  |   File "/usr/local/lib/python3.10/dist-packages/text_generation_server/cli.py", line 126, in download_weights
mistral_7b-1  |     from text_generation_server import utils
mistral_7b-1  | 
mistral_7b-1  |   File "/usr/local/lib/python3.10/dist-packages/text_generation_server/utils/__init__.py", line 3, in <module>
mistral_7b-1  |     from text_generation_server.utils.weights import Weights
mistral_7b-1  | 
mistral_7b-1  |   File "/usr/local/lib/python3.10/dist-packages/text_generation_server/utils/weights.py", line 10, in <module>
mistral_7b-1  |     from text_generation_server.layers.exl2 import Exl2Weight
mistral_7b-1  | 
mistral_7b-1  |   File "/usr/local/lib/python3.10/dist-packages/text_generation_server/layers/__init__.py", line 1, in <module>
mistral_7b-1  |     from text_generation_server.layers.tensor_parallel import (
mistral_7b-1  | 
mistral_7b-1  |   File "/usr/local/lib/python3.10/dist-packages/text_generation_server/layers/tensor_parallel.py", line 4, in <module>
mistral_7b-1  |     from text_generation_server.layers.linear import get_linear, FastLinear
mistral_7b-1  | 
mistral_7b-1  |   File "/usr/local/lib/python3.10/dist-packages/text_generation_server/layers/linear.py", line 6, in <module>
mistral_7b-1  |     from text_generation_server.layers.gptq import GPTQWeight
mistral_7b-1  | 
mistral_7b-1  |   File "/usr/local/lib/python3.10/dist-packages/text_generation_server/layers/gptq/__init__.py", line 61, in <module>
mistral_7b-1  |     from text_generation_server.layers.gptq.quant_linear import QuantLinear
mistral_7b-1  | 
mistral_7b-1  |   File "/usr/local/lib/python3.10/dist-packages/text_generation_server/layers/gptq/quant_linear.py", line 7, in <module>
mistral_7b-1  |     import triton
mistral_7b-1  | 
mistral_7b-1  | ModuleNotFoundError: No module named 'triton'
mistral_7b-1  | 
mistral_7b-1 exited with code 1

Expected behavior

Image should start correctly. Most probably Triton should not be imported when using Intel XPUs

github-actions[bot] commented 5 days ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.