vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
28.89k stars 4.29k forks source link

[Bug]: custom docker Error #6717

Open ciaoyizhen opened 3 months ago

ciaoyizhen commented 3 months ago

Your current environment

The output of `python collect_env.py`

Differences between docker and local

in docker:

CUDA runtime version:Could not collect
cuDNN version: 9.0.0

in host:

CUDA runtime version: 12.3.52
cuDNN version: 8.5.0

v100 16G * 4

🐛 Describe the bug

in host, it's ok. but it's error in docker.

base docker: nvidia/cuda:12.3.2-cudnn9-runtime-centos7 code: https://github.com/THUDM/GLM-4/blob/main/basic_demo/openai_api_server.py modify tensor_parallel_size=1 to tensor_parallel_size=2 CUDA_VISIBLE_DEVICES=0,1

requirements: torch==2.3.0 fastapi==0.111.0 transformers==4.41.2 vllm==0.4.3 sse-starlette=2.1.0

error in init_device I found why is NCCL_CHECK return 2 not 0.

How do I fix this bug? In docker_run.sh docker run -- runtime nvidia --gpus all --rm --shm-size=30gb --privileged -e CUDA_VISIBLE_DEVICES=0,1 xxxxx like this

ciaoyizhen commented 3 months ago

BTW, this is ok by using transformers in docker. But in vllm report error

ciaoyizhen commented 3 months ago

I'm sorry, I told a wrong thing. Would NCCL_CHECK return 2 would report when not using --shm-size=30gb When this parameter is used, it gets stuck The terminal shows that it stopped at Using XFormers backend

Under normal circumstances should be followed by load_model

ciaoyizhen commented 3 months ago

I run it again and waiting long time. the error happended in self.run_worker_outputs("load_model"). finally error happend in ray_worker_outputs = ray.get(ray_worker_outputs)

ray.exceptions.ActorDiedError: The actor died unexpectedly before finishing this task

youkaichao commented 3 months ago

seems like a ray error cc @rkooo567

BTW, you are using an old version of vLLM. update to a recent version, and it does not use ray. hopefully it can avoid the error.

ciaoyizhen commented 3 months ago

thank you for reply! @youkaichao I want to know tensor_parallel_size=1 Does this case use ray? Because I tried it in the case of glm3 small using 1 also fails in the docker (local ok)

I downloaded vllm 0.5.2 earlier, but when I run it, it gives me an error. Error in calling custom op rms_norm: '_OpNamespace' '_C' object has no attribute 'rms_norm' in local

ciaoyizhen commented 2 months ago

thank you for reply! @youkaichao I want to know tensor_parallel_size=1 Does this case use ray? Because I tried it in the case of glm3 small using 1 also fails in the docker (local ok)

I downloaded vllm 0.5.2 earlier, but when I run it, it gives me an error. Error in calling custom op rms_norm: '_OpNamespace' '_C' object has no attribute 'rms_norm' in local

ok. because my python is 3.10 but vllm0.5.2 not support 3.10. I try vllm 0.5.0 and error again.

ERROR multiproc_worker_utils.py:120 Worker vLLMworkerProcess pid 2182 died, exit code: -9
INFO multiproc_worker_utils.py:123 Killing local vLLM worker processes Killed

only this error. waiting long time then report this