vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.05k stars 3.82k forks source link

[Usage]: /tmp/ray PermisionDenied #3513

Open rileyhun opened 5 months ago

rileyhun commented 5 months ago

Your current environment

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Amazon Linux 2 (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.26

Python version: 3.11.8 (main, Mar 17 2024, 20:06:37) [GCC 7.3.1 20180712 (Red Hat 7.3.1-17)] (64-bit runtime)
Python platform: Linux-4.14.336-256.559.amzn2.x86_64-x86_64-with-glibc2.26
Is CUDA available: N/A
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              48
On-line CPU(s) list: 0-47
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
Stepping:            7
CPU MHz:             1544.802
BogoMIPS:            5000.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K
NUMA node0 CPU(s):   0-47
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke

Versions of relevant libraries:
[pip3] numpy==1.26.4
[conda] No relevant packagesROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect

How would you like to use vllm

Can VLLM switch the Ray root temporary directory (currently /tmp/ray) to another path? For some reason, keep getting PermissionDenied for the /tmp folder when I switch to a different user in the docker file. root does work though.

Relevant error:

Traceback (most recent call last):  File "/opt/program/api_server.py", line 125, in <module>    engine = AsyncLLMEngine.from_engine_args(engine_args)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  File "/usr/local/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 622, in from_engine_args    placement_group = initialize_cluster(parallel_config,                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  File "/usr/local/lib/python3.11/site-packages/vllm/engine/ray_utils.py", line 101, in initialize_cluster    ray.init(address=ray_address, ignore_reinit_error=True)  File "/usr/local/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper    return func(*args, **kwargs)           ^^^^^^^^^^^^^^^^^^^^^  File "/usr/local/lib/python3.11/site-packages/ray/_private/worker.py", line 1618, in init    _global_node = ray._private.node.Node(                   ^^^^^^^^^^^^^^^^^^^^^^^  File "/usr/local/lib/python3.11/site-packages/ray/_private/node.py", line 193, in __init__    self._init_temp()  File "/usr/local/lib/python3.11/site-packages/ray/_private/node.py", line 430, in _init_temp    try_to_create_directory(self._temp_dir)  File "/usr/local/lib/python3.11/site-packages/ray/_private/utils.py", line 936, in try_to_create_directory    os.makedirs(directory_path, exist_ok=True)  File "<frozen os>", line 225, in makedirs
--
PermissionError: [Errno 13] Permission denied: '/tmp/ray'
rkooo567 commented 5 months ago

I think this simply means your docker user doesn't have the access to the /tmp folder?

rkooo567 commented 5 months ago

what kind of docker images do you use?