vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.86k stars 3.94k forks source link

v0.2.3 docker can't recognize 4090 gpu #2013

Closed DeoLeung closed 9 months ago

DeoLeung commented 9 months ago

Hi, I'm trying the official image with config

  vllm:
    <<: *default-gpu
    image: vllm/vllm-openai:v0.2.3
    command: --model Qwen/Qwen-7B-Chat --trust-remote-code
    environment:
      #NCCL_P2P_DISABLE: 1
      HF_HUB_OFFLINE: 0
      CUDA_VISIBLE_DEVICES: 6
    shm_size: 10.24gb

and it fails with

docker-vllm-1  | INFO 12-11 09:14:16 api_server.py:638] args: Namespace(host=None, port=8000, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], served_model_name=None, model='Qwen/Qwen-7B-Chat', tokenizer=None, revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, download_dir=None, load_format='auto', dtype='auto', max_model_len=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, block_size=16, seed=0, swap_space=4, gpu_memory_utilization=0.9, max_num_batched_tokens=None, max_num_seqs=256, max_paddings=256, disable_log_stats=False, quantization=None, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
docker-vllm-1  | INFO 12-11 09:14:18 llm_engine.py:72] Initializing an LLM engine with config: model='Qwen/Qwen-7B-Chat', tokenizer='Qwen/Qwen-7B-Chat', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, seed=0)
docker-vllm-1  | WARNING 12-11 09:14:19 tokenizer.py:66] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
docker-vllm-1  | /usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
docker-vllm-1  |   return torch._C._cuda_getDeviceCount() > 0
docker-vllm-1  | Traceback (most recent call last):
docker-vllm-1  |   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
docker-vllm-1  |     return _run_code(code, main_globals, None,
docker-vllm-1  |   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
docker-vllm-1  |     exec(code, run_globals)
docker-vllm-1  |   File "/workspace/vllm/entrypoints/openai/api_server.py", line 646, in <module>
docker-vllm-1  |     engine = AsyncLLMEngine.from_engine_args(engine_args)
docker-vllm-1  |   File "/workspace/vllm/engine/async_llm_engine.py", line 486, in from_engine_args
docker-vllm-1  |     engine = cls(parallel_config.worker_use_ray,
docker-vllm-1  |   File "/workspace/vllm/engine/async_llm_engine.py", line 269, in __init__
docker-vllm-1  |     self.engine = self._init_engine(*args, **kwargs)
docker-vllm-1  |   File "/workspace/vllm/engine/async_llm_engine.py", line 305, in _init_engine
docker-vllm-1  |     return engine_class(*args, **kwargs)
docker-vllm-1  |   File "/workspace/vllm/engine/llm_engine.py", line 110, in __init__
docker-vllm-1  |     self._init_workers(distributed_init_method)
docker-vllm-1  |   File "/workspace/vllm/engine/llm_engine.py", line 142, in _init_workers
docker-vllm-1  |     self._run_workers(
docker-vllm-1  |   File "/workspace/vllm/engine/llm_engine.py", line 700, in _run_workers
docker-vllm-1  |     output = executor(*args, **kwargs)
docker-vllm-1  |   File "/workspace/vllm/worker/worker.py", line 60, in init_model
docker-vllm-1  |     torch.cuda.set_device(self.device)
docker-vllm-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 404, in set_device
docker-vllm-1  |     torch._C._cuda_setDevice(device)
docker-vllm-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 298, in _lazy_init
docker-vllm-1  |     torch._C._cuda_init()
docker-vllm-1  | RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW

my nvisia-smi output as follow

# NVIDIA GeForce RTX 4090
nvidia-smi                                                                           
Mon Dec 11 17:23:21 2023                                                                                                  
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:4F:00.0 Off |                  Off |
| 32%   19C    P8    21W / 450W |      5MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:52:00.0 Off |                  Off |
| 32%   21C    P8    20W / 450W |      5MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:56:00.0 Off |                  Off |
| 31%   20C    P8    22W / 450W |      5MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:57:00.0 Off |                  Off |
| 32%   21C    P8    19W / 450W |  14056MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  NVIDIA GeForce ...  Off  | 00000000:CE:00.0 Off |                  Off |
| 30%   23C    P2    52W / 450W |  18484MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  NVIDIA GeForce ...  Off  | 00000000:D1:00.0 Off |                  Off |
| 32%   24C    P2    53W / 450W |  18324MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  NVIDIA GeForce ...  Off  | 00000000:D5:00.0 Off |                  Off |
| 31%   22C    P8    15W / 450W |      5MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  NVIDIA GeForce ...  Off  | 00000000:D6:00.0 Off |                  Off |
| 30%   22C    P8    15W / 450W |      5MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

do I need to upgrade something for vllm to work?

DeoLeung commented 9 months ago

rebuild it manually using lower cuda base image works. Dockerfile like nvidia/cuda:12.0.0-devel-ubuntu22.04

gameveloster commented 9 months ago

@DeoLeung Can you share the docker build command? I tried using a lower cuda base image nvidia/cuda:12.0.0-devel-ubuntu22.04 but its not detecting my GPU during the docker build.

My host is using Driver Version: 525.147.05, CUDA Version: 12.0

DeoLeung commented 9 months ago

@DeoLeung Can you share the docker build command? I tried using a lower cuda base image nvidia/cuda:12.0.0-devel-ubuntu22.04 but its not detecting my GPU during the docker build.

My host is using Driver Version: 525.147.05, CUDA Version: 12.0

DOCKER_BUILDKIT=1 docker build . --target vllm-openai --tag vllm/vllm-openai:v0.2.4-mdt --build-arg max_jobs=32 --build-arg nvcc_threads=16 --build-arg torch_cuda_arch_list=8.9

change the arch according to your gpu compute capacity