Closed vecorro closed 4 months ago
It works with the container tag [v0.3.3]. It does not work with either v0.4.2 or v0.4.1
We have the same symptom as well. we have conducted important function tests on a single A100 using vllm 0.3.3, but after upgrading the version to vllm 0.4 (and vllm 0.4.2 as well), it won't run at all with the same error message.
@youkaichao Do you have any insights on this?
FWIW, 0.4.1 works for me without the custom all-reduce operation but 0.4.2 does expose some issues as well
The error trace points to pytorch distributed, so that's not what I know. It's inside pytorch i think.
The problem looks strange, because you only have 1 GPUs while pytorch tries to read the p2p status between 0 and 0 (essentially the GPU itself).
One educated guess, maybe you can try to upgrade the driver version, I remember several issues can be solved by upgrading to driver 540. 535 seems to be buggy.
Thank you, @youkaichao. You're right. After upgrading the NVIDIA driver to v550.x vLLM 0.4.2 worked properly
Your current environment
🐛 Describe the bug
I'm trying to run Llama3 using Docker this way:
My GPU is properly configured:
but I get the following error: