Closed paolovic closed 2 months ago
how can I report / ban @jia6214876 ?
Definitely malware - it's also happening on other repos like Unsloth: https://github.com/unslothai/unsloth/issues/960
This seems to be spreading a bit. I'm guessing the vLLM maintainers are working overtime to report and block them.
first of all, notice that:
WARNING 08-27 02:59:41 cuda.py:22] You are using a deprecated
pynvml
package. Please installnvidia-ml-py
instead. See https://pypi.org/project/pynvml for more information.
please uninstall pynvml
.
second, you can try to place some debug print statements in:
(ServeController pid=874997) File "/paolovic/vllm/vllm/platforms/cuda.py", line 86, in device_id_to_physical_device_id
it looks very strange why you get this error.
Hi @youkaichao ,
thank you very much for your support, again! FYI: I have built vllm
from source
I removed pynvml
but it didn't help.
So, I am logging out logger.info(f"CUDA_VISIBLE_DEVICES: {os.environ['CUDA_VISIBLE_DEVICES']}")
in /paolovic/vllm/vllm/platforms/cuda.py
and it returns an empty environment variable (ServeReplica:default:VLLMDeployment pid=1016724) INFO 08-27 13:30:03 cuda.py:83] CUDA_VISIBLE_DEVICES:
In fact, I adapted the code in cuda.py
like so
def device_id_to_physical_device_id(device_id: int) -> int:
logger.info(f"CUDA_VISIBLE_DEVICES: {os.environ['CUDA_VISIBLE_DEVICES']}")
import ipdb; ipdb.set_trace()
os.environ["CUDA_VISIBLE_DEVICES"]="0,1,2"
if "CUDA_VISIBLE_DEVICES" in os.environ:
device_ids = os.environ["CUDA_VISIBLE_DEVICES"].split(",")
physical_device_id = device_ids[device_id]
return int(physical_device_id)
else:
return device_id
To be able to continue for now, I hardcoded os.environ["CUDA_VISIBLE_DEVICES"]="0,1,2"
Hardcoding helps to circumvent for this development phase, but can clearly not be used in production. So I have to find the root cause.
Furthermore, I run out of CUDA memory with the current hardcoded CUDA_VISIBLE_DEVICES setup.
Accordingly, I wanted to enforce_eager
to reduce the memory consumption
serve run llm:build_app model="/models/llama-3-70b-instruct-awq-main-4bit/" tensor-parallel-size=2 quantization=awq enforce_eager=True
FYI, I had to remove the "True" value from the arg_strings = [x for x in arg_strings if x != "True"]
in the def parse_vllm_args(cli_args: Dict[str, str]):
of llm.py
But without the hardcoded os.environ["CUDA_VISIBLE_DEVICES"]="0,1,2"
in the cuda.py
it fails -.-
I have to fix this, hardcoding is not an option
for enforce_eager
, it is a flag. please use --enforce_eager
.
So, I am logging out logger.info(f"CUDA_VISIBLE_DEVICES: {os.environ['CUDA_VISIBLE_DEVICES']}") in /paolovic/vllm/vllm/platforms/cuda.py and it returns an empty environment variable (ServeReplica:default:VLLMDeployment pid=1016724) INFO 08-27 13:30:03 cuda.py:83] CUDA_VISIBLE_DEVICES:
your problem arises from the fact that CUDA_VISIBLE_DEVICES
is set to an empty string, which means cuda devices are disabled
you need to check which part of code changes this.
https://github.com/vllm-project/vllm/pull/7924 will give you clear error message.
but still, the problem does not come from vllm side I think. you can keep investigating, which part of your code leads to this problem.
for
enforce_eager
, it is a flag. please use--enforce_eager
.Using a flag leads to an error, therefore my workaround
thank you very much @youkaichao
Your current environment
The output of `python collect_env.py`
```text Collecting environment information... WARNING 08-27 02:59:41 cuda.py:22] You are using a deprecated `pynvml` package. Please install `nvidia-ml-py` instead. See https://pypi.org/project/pynvml for more information. PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Red Hat Enterprise Linux release 8.10 (Ootpa) (x86_64) GCC version: (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22) Clang version: Could not collect CMake version: version 3.29.0 Libc version: glibc-2.28 Python version: 3.11.9 (main, Jun 19 2024, 10:02:06) [GCC 8.5.0 20210514 (Red Hat 8.5.0-22)] (64-bit runtime) Python platform: Linux-4.18.0-553.8.1.el8_10.x86_64-x86_64-with-glibc2.28 Is CUDA available: True CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA L40S-48C GPU 1: NVIDIA L40S-48C GPU 2: NVIDIA L40S-48C Nvidia driver version: 535.129.03 cuDNN version: Probably one of the following: /usr/lib64/libcudnn.so.9.3.0 /usr/lib64/libcudnn_adv.so.9.3.0 /usr/lib64/libcudnn_cnn.so.9.3.0 /usr/lib64/libcudnn_engines_precompiled.so.9.3.0 /usr/lib64/libcudnn_engines_runtime_compiled.so.9.3.0 /usr/lib64/libcudnn_graph.so.9.3.0 /usr/lib64/libcudnn_heuristic.so.9.3.0 /usr/lib64/libcudnn_ops.so.9.3.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 Thread(s) per core: 1 Core(s) per socket: 12 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 143 Model name: Intel(R) Xeon(R) Platinum 8462Y+ Stepping: 8 CPU MHz: 2799.999 BogoMIPS: 5599.99 Hypervisor vendor: VMware Virtualization type: full L1d cache: 48K L1i cache: 32K L2 cache: 2048K L3 cache: 61440K NUMA node0 CPU(s): 0-11 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 wbnoinvd arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid cldemote movdiri movdir64b fsrm md_clear flush_l1d arch_capabilities Versions of relevant libraries: [pip3] numpy==1.26.4 [pip3] nvidia-cublas-cu12==12.1.3.1 [pip3] nvidia-cuda-cupti-cu12==12.1.105 [pip3] nvidia-cuda-nvrtc-cu12==12.1.105 [pip3] nvidia-cuda-runtime-cu12==12.1.105 [pip3] nvidia-cudnn-cu12==9.1.0.70 [pip3] nvidia-cufft-cu12==11.0.2.54 [pip3] nvidia-curand-cu12==10.3.2.106 [pip3] nvidia-cusolver-cu12==11.4.5.107 [pip3] nvidia-cusparse-cu12==12.1.0.106 [pip3] nvidia-ml-py==12.555.43 [pip3] nvidia-nccl-cu12==2.20.5 [pip3] nvidia-nvjitlink-cu12==12.4.127 [pip3] nvidia-nvtx-cu12==12.1.105 [pip3] pynvml==11.5.0 [pip3] pyzmq==26.2.0 [pip3] sentence-transformers==2.5.1 [pip3] torch==2.4.0 [pip3] torchvision==0.19.0 [pip3] transformers==4.44.2 [pip3] triton==3.0.0 [pip3] vllm_nccl_cu12==2.18.1.0.4.0 [conda] Could not collect ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: 0.5.5@e397b92f84b7771cfd04b8fbb87894e9ec95f873 vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: GPU0 GPU1 GPU2 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X PIX PIX 0-11 0 N/A GPU1 PIX X PIX 0-11 0 N/A GPU2 PIX PIX X 0-11 0 N/A Legend: X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks ```🐛 Describe the bug
Hi,
I am trying to execute the following
llm.py
from https://docs.ray.io/en/latest/serve/tutorials/vllm-example.htmlI execute it like the following:
serve run llm:build_app model="/u01/data/analytics/models/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4/" tensor-parallel-size=2
Unfortunaley, it fails when trying to detect the
CUDA_VISIBLE_DEVICES
Although I have set
export CUDA_VISIBLE_DEVICES=0,1,2
Thank you very much for any help!
Before submitting a new issue...