Open ciaoyizhen opened 3 months ago
BTW, this is ok by using transformers in docker. But in vllm report error
I'm sorry, I told a wrong thing. Would NCCL_CHECK return 2 would report when not using --shm-size=30gb When this parameter is used, it gets stuck The terminal shows that it stopped at Using XFormers backend
Under normal circumstances should be followed by load_model
I run it again and waiting long time. the error happended in self.run_worker_outputs("load_model")
. finally error happend in ray_worker_outputs = ray.get(ray_worker_outputs)
ray.exceptions.ActorDiedError: The actor died unexpectedly before finishing this task
seems like a ray error cc @rkooo567
BTW, you are using an old version of vLLM. update to a recent version, and it does not use ray. hopefully it can avoid the error.
thank you for reply! @youkaichao
I want to know tensor_parallel_size=1
Does this case use ray? Because I tried it in the case of glm3 small using 1 also fails in the docker (local ok)
I downloaded vllm 0.5.2 earlier, but when I run it, it gives me an error.
Error in calling custom op rms_norm: '_OpNamespace' '_C' object has no attribute 'rms_norm'
in local
thank you for reply! @youkaichao I want to know
tensor_parallel_size=1
Does this case use ray? Because I tried it in the case of glm3 small using 1 also fails in the docker (local ok)I downloaded vllm 0.5.2 earlier, but when I run it, it gives me an error.
Error in calling custom op rms_norm: '_OpNamespace' '_C' object has no attribute 'rms_norm'
in local
ok. because my python is 3.10 but vllm0.5.2 not support 3.10. I try vllm 0.5.0 and error again.
ERROR multiproc_worker_utils.py:120 Worker vLLMworkerProcess pid 2182 died, exit code: -9
INFO multiproc_worker_utils.py:123 Killing local vLLM worker processes Killed
only this error. waiting long time then report this
Your current environment
Differences between docker and local
in docker:
in host:
v100 16G * 4
🐛 Describe the bug
in host, it's ok. but it's error in docker.
base docker: nvidia/cuda:12.3.2-cudnn9-runtime-centos7 code: https://github.com/THUDM/GLM-4/blob/main/basic_demo/openai_api_server.py modify
tensor_parallel_size=1
totensor_parallel_size=2
CUDA_VISIBLE_DEVICES=0,1requirements: torch==2.3.0 fastapi==0.111.0 transformers==4.41.2 vllm==0.4.3 sse-starlette=2.1.0
error in
init_device
I found why is NCCL_CHECK return 2 not 0.How do I fix this bug? In docker_run.sh docker run -- runtime nvidia --gpus all --rm --shm-size=30gb --privileged -e CUDA_VISIBLE_DEVICES=0,1 xxxxx like this