Closed wyooyw closed 2 months ago
Hello @wyooyw. I notice that your environment information shows that you have 2 GPUs:
GPU models and configuration:
GPU 0: NVIDIA H800
GPU 1: NVIDIA H800
Nvidia driver version: 525.105.17
Using 4-way tensor parallel with TP=4 will require 4 GPUs. So the error that "No CUDA GPUs are available" seems accurate.
When it worked without Ray do you also mean without TP=4?
sorry, it's a typo. we have two gpus and we use tp=2. There are more details: we have two test cases, one passed and another one failed with "No CUDA GPUs are available" as described above.
passed:
self.llm = vllm.LLM(*args, **kwargs)
failed:
from rlhf.vllm_generation.vlm.model_loader import MyMegatronLoader # this is my customized model loader
self.llm = vllm.LLM(*args, load_format=AgiVlmMegatronLoader, **kwargs)
Ah, that makes more sense.
The second test fails even when it is the only test run in the Python session, right? If it only fails when running after the first test, then it may just be problem with the garbage collection not fully removing its usage of the GPUs.
If the second test fails on its own where the first one passes, then it seems that the issue would be coming from your customized model loader.
Related to #7013?
the latest main branch already fixed this, via upgrading to pytorch 2.4
Your current environment
🐛 Describe the bug
We used VLLM to execute the qwen2 model with TP=4 in Ray Actor, but one of the four processes reported an error "No CUDA GPUs are available".
When executing vllm separately without Ray Actor wrapper, it can be executed normally.