Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
I am using get_vllm_engine to load qwen2.5 72b int4, and I have specified using multiple cuda cards by using os.environ. But i noticed that it is still loading on a single gpu card.
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
GPU: H100
CUDA version: 12.2
vllm: 0.6.1.post2
transformers: 4.44.2
torch:2.4.0
Additional context
Add any other context about the problem here(在这里补充其他信息)
Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图) I am using
get_vllm_engine
to load qwen2.5 72b int4, and I have specified using multiple cuda cards by using os.environ. But i noticed that it is still loading on a single gpu card.Here is my code.
Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等) GPU: H100 CUDA version: 12.2 vllm: 0.6.1.post2 transformers: 4.44.2 torch:2.4.0
Additional context Add any other context about the problem here(在这里补充其他信息)