vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.59k stars 3.9k forks source link

[Bug]: offline test, Process hangs without exiting when using cuda graph #4263

Closed DefTruth closed 4 months ago

DefTruth commented 4 months ago

Your current environment

Collecting environment information...
PyTorch version: 2.2.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.29.2
Libc version: glibc-2.35

Python version: 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-4.18.0-240.el8.x86_64-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.3.107
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA L20
GPU 1: NVIDIA L20
GPU 2: NVIDIA L20
GPU 3: NVIDIA L20
GPU 4: NVIDIA L20
GPU 5: NVIDIA L20
GPU 6: NVIDIA L20
GPU 7: NVIDIA L20

Nvidia driver version: 550.54.15
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.0.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.7
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   52 bits physical, 57 bits virtual
Byte Order:                      Little Endian
CPU(s):                          128
On-line CPU(s) list:             0-127
Vendor ID:                       GenuineIntel
BIOS Vendor ID:                  Intel(R) Corporation
Model name:                      Intel(R) Xeon(R) Gold 6430
BIOS Model name:                 Intel(R) Xeon(R) Gold 6430
CPU family:                      6
Model:                           143
Thread(s) per core:              2
Core(s) per socket:              32
Socket(s):                       2
Stepping:                        8
Frequency boost:                 enabled
CPU max MHz:                     2101.0000
CPU min MHz:                     800.0000
BogoMIPS:                        4200.00
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 invpcid_single cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid cldemote movdiri movdir64b md_clear pconfig flush_l1d arch_capabilities
Virtualization:                  VT-x
L1d cache:                       3 MiB (64 instances)
L1i cache:                       2 MiB (64 instances)
L2 cache:                        128 MiB (64 instances)
L3 cache:                        120 MiB (2 instances)
NUMA node(s):                    2
NUMA node0 CPU(s):               0-31,64-95
NUMA node1 CPU(s):               32-63,96-127
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2:        Vulnerable, IBPB: disabled, STIBP: disabled
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.19.3
[pip3] torch==2.2.1
[pip3] triton==2.2.0
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-nccl-cu12          2.19.3                   pypi_0    pypi
[conda] torch                     2.2.1                    pypi_0    pypi
[conda] triton                    2.2.0                    pypi_0    pypiROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.4.1
vLLM Build Flags:
CUDA Archs: 5.2 6.0 6.1 7.0 7.2 7.5 8.0 8.6 8.7 9.0+PTX; ROCm: Disabled; Neuron: Disabled
GPU Topology:
�[4mGPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    CPU Affinity    NUMA Affinity   GPU NUMA ID�[0m
GPU0     X  PIX PIX PIX SYS SYS SYS SYS 0-31,64-95  0       N/A
GPU1    PIX  X  PIX PIX SYS SYS SYS SYS 0-31,64-95  0       N/A
GPU2    PIX PIX  X  PIX SYS SYS SYS SYS 0-31,64-95  0       N/A
GPU3    PIX PIX PIX  X  SYS SYS SYS SYS 0-31,64-95  0       N/A
GPU4    SYS SYS SYS SYS  X  PIX PIX PIX 32-63,96-127    1       N/A
GPU5    SYS SYS SYS SYS PIX  X  PIX PIX 32-63,96-127    1       N/A
GPU6    SYS SYS SYS SYS PIX PIX  X  PIX 32-63,96-127    1       N/A
GPU7    SYS SYS SYS SYS PIX PIX PIX  X  32-63,96-127    1       N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

offline test, Process hangs without exiting when using cuda graph

cost time(s) 2.5229225158691406
(RayWorkerWrapper pid=404354) INFO 04-22 17:42:14 model_runner.py:1057] Graph capturing finished in 9 secs. [repeated 2x across cluster]
(RayWorkerWrapper pid=404354) [W socket.cpp:697] [c10d] The client socket cannot be initialized to connect to [::ffff:10.189.108.254]:47111 (errno: 97 - Address family not supported by protocol). [repeated 2x across cluster]
#  offline test, Process hangs without exiting when using cuda graph

but without cuda graph, it will exit normally. It is somethings wrong while using cuda graph in vllm?

youkaichao commented 4 months ago

Can you try to run with export VLLM_TRACE_FUNCTION=1 ? This should give you hint on which function crashes or hangs.

DefTruth commented 4 months ago

I will try it with latest vLLM

DefTruth commented 4 months ago

@youkaichao after export VLLM_TRACE_FUNCTION=1 and CUDA graph, the log show the last call is disable_client_hook:

2024-04-23 11:07:10.120712 Return from actor_id in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:502
2024-04-23 11:07:10.120731 Return from record_task_log_end in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:628
2024-04-23 11:07:10.120781 Call to get_serialization_context in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:660
2024-04-23 11:07:10.120801 Call to current_job_id in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:491
2024-04-23 11:07:10.120820 Return from current_job_id in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:494
2024-04-23 11:07:10.120844 Return from get_serialization_context in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:672
2024-04-23 11:07:10.120863 Call to serialize in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/serialization.py:482
2024-04-23 11:07:10.120886 Call to _serialize_to_msgpack in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/serialization.py:433
2024-04-23 11:07:10.120924 Call to packb in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/msgpack/__init__.py:30
2024-04-23 11:07:10.120950 Return from packb in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/msgpack/__init__.py:36
2024-04-23 11:07:10.120979 Call to packb in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/msgpack/__init__.py:30
2024-04-23 11:07:10.120999 Return from packb in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/msgpack/__init__.py:36
2024-04-23 11:07:10.121018 Return from _serialize_to_msgpack in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/serialization.py:478
2024-04-23 11:07:10.121037 Return from serialize in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/serialization.py:494
2024-04-23 11:07:10.121084 Call to _changeproctitle in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:2528
2024-04-23 11:07:10.121104 Call to _mode in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:3051
2024-04-23 11:07:10.121122 Return from _mode in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:3059
2024-04-23 11:07:10.121163 Return from _changeproctitle in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:2531
2024-04-23 11:07:10.121233 Call to disable_client_hook in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:67
2024-04-23 11:07:10.121253 Call to _set_client_hook_status in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:38
2024-04-23 11:07:10.121273 Return from _set_client_hook_status in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:40
2024-04-23 11:07:10.121291 Return from disable_client_hook in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:69

and the GPU memory not release:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:0E:00.0 Off |                    0 |
| N/A   38C    P0             76W /  350W |   37354MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:0F:00.0 Off |                    0 |
| N/A   34C    P8             34W /  350W |      81MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:10:00.0 Off |                    0 |
| N/A   35C    P8             34W /  350W |       9MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:12:00.0 Off |                    0 |
| N/A   31C    P8             36W /  350W |      21MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
DefTruth commented 4 months ago

@youkaichao without cuda graph, the traced log:

2024-04-23 10:57:19.506014 Call to serialize in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/serialization.py:482
2024-04-23 10:57:19.506037 Call to _serialize_to_msgpack in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/serialization.py:433
2024-04-23 10:57:19.506076 Call to packb in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/msgpack/__init__.py:30
2024-04-23 10:57:19.506103 Return from packb in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/msgpack/__init__.py:36
2024-04-23 10:57:19.506132 Call to packb in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/msgpack/__init__.py:30
2024-04-23 10:57:19.506162 Return from packb in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/msgpack/__init__.py:36
2024-04-23 10:57:19.506182 Return from _serialize_to_msgpack in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/serialization.py:478
2024-04-23 10:57:19.506203 Return from serialize in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/serialization.py:494
2024-04-23 10:57:19.506251 Call to _changeproctitle in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:2528
2024-04-23 10:57:19.506274 Call to _mode in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:3051
2024-04-23 10:57:19.506296 Return from _mode in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:3059
2024-04-23 10:57:19.506334 Return from _changeproctitle in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:2531
2024-04-23 10:57:19.506406 Call to disable_client_hook in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:67
2024-04-23 10:57:19.506428 Call to _set_client_hook_status in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:38
2024-04-23 10:57:19.506448 Return from _set_client_hook_status in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:40
2024-04-23 10:57:19.506465 Return from disable_client_hook in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:69
2024-04-23 10:57:20.126963 Call to sigterm_handler in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:872
2024-04-23 10:57:20.127050 Return from sigterm_handler in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:873

it will call sigterm_handler after disable_client_hook, but the one with cuda graph will not.

DefTruth commented 4 months ago

after i try #4278 , the log:

on3.10/site-packages/ray/_private/worker.py:2530
2024-04-23 11:32:40.659115 Return from _mode in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:3059 to _changeproctitle in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:2530
2024-04-23 11:32:40.659164 Return from _changeproctitle in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:2531 to __exit__ in /root/anaconda3/envs/vllm/lib/python3.10/contextlib.py:142
2024-04-23 11:32:40.659244 Call to disable_client_hook in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:67 from __exit__ in /root/anaconda3/envs/vllm/lib/python3.10/contextlib.py:142
2024-04-23 11:32:40.659267 Call to _set_client_hook_status in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:38 from disable_client_hook in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:69
2024-04-23 11:32:40.659291 Return from _set_client_hook_status in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:40 to disable_client_hook in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:69
2024-04-23 11:32:40.659309 Return from disable_client_hook in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:69 to __exit__ in /root/anaconda3/envs/vllm/lib/python3.10/contextlib.py:142

it seems hang here:

__exit__ in /root/anaconda3/envs/vllm/lib/python3.10/contextlib.py:142
youkaichao commented 4 months ago

Please give more logs, at least when the code is related with vllm. All the trace here is related with ray.

DefTruth commented 4 months ago
2024-04-23 11:38:48.121370 Return from is_initialized in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:950 to _get_default_group in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:976
2024-04-23 11:38:48.121388 Call to WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:583 from _get_default_group in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:981
2024-04-23 11:38:48.121405 Call to default_pg in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:453 from WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585
2024-04-23 11:38:48.121423 Return from default_pg in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:461 to WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585
2024-04-23 11:38:48.121440 Return from WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585 to _get_default_group in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:981
2024-04-23 11:38:48.121456 Return from _get_default_group in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:981 to get_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1532
2024-04-23 11:38:48.121476 Return from get_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1534 to gather in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2880
2024-04-23 11:38:48.121496 Call to _validate_output_list_for_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2834 from gather in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2881
2024-04-23 11:38:48.121515 Return from _validate_output_list_for_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2840 to gather in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2881
2024-04-23 11:38:48.121550 Call to WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:583 from gather in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2888
2024-04-23 11:38:48.121568 Call to default_pg in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:453 from WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585
2024-04-23 11:38:48.121585 Return from default_pg in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:461 to WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585
2024-04-23 11:38:48.121601 Return from WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585 to gather in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2888
2024-04-23 11:38:48.121628 Call to get_group_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:762 from gather in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2892
2024-04-23 11:38:48.121645 Call to WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:583 from get_group_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:777
2024-04-23 11:38:48.121663 Call to default_pg in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:453 from WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585
2024-04-23 11:38:48.121681 Return from default_pg in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:461 to WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585
2024-04-23 11:38:48.121698 Return from WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585 to get_group_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:777
2024-04-23 11:38:48.121715 Call to pg_group_ranks in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:490 from get_group_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:779
2024-04-23 11:38:48.121733 Return from pg_group_ranks in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:498 to get_group_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:779
2024-04-23 11:38:48.121751 Call to pg_group_ranks in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:490 from get_group_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:781
2024-04-23 11:38:48.121768 Return from pg_group_ranks in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:498 to get_group_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:781
2024-04-23 11:38:48.121787 Return from get_group_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:785 to gather in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2892
2024-04-23 11:38:48.130120 Return from gather in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2899 to wrapper in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:72
2024-04-23 11:38:48.130181 Return from wrapper in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:72 to tensor_model_parallel_gather in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/distributed/communication_op.py:95
2024-04-23 11:38:48.130208 Call to get_tensor_model_parallel_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/distributed/parallel_state.py:216 from tensor_model_parallel_gather in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/distributed/communication_op.py:99
2024-04-23 11:38:48.130229 Call to get_tensor_model_parallel_group in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/distributed/parallel_state.py:190 from get_tensor_model_parallel_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/distributed/parallel_state.py:218
2024-04-23 11:38:48.130251 Return from get_tensor_model_parallel_group in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/distributed/parallel_state.py:194 to get_tensor_model_parallel_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/distributed/parallel_state.py:218
2024-04-23 11:38:48.130282 Call to get_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1512 from get_tensor_model_parallel_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/distributed/parallel_state.py:218
2024-04-23 11:38:48.130302 Call to _rank_not_in_group in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:747 from get_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1529
2024-04-23 11:38:48.130324 Return from _rank_not_in_group in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:751 to get_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1529
2024-04-23 11:38:48.130346 Call to _get_default_group in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:974 from get_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1532
2024-04-23 11:38:48.130366 Call to is_initialized in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:948 from _get_default_group in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:976
2024-04-23 11:38:48.130386 Call to WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:583 from is_initialized in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:950
2024-04-23 11:38:48.130405 Call to default_pg in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:453 from WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585
2024-04-23 11:38:48.130424 Return from default_pg in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:461 to WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585
2024-04-23 11:38:48.130445 Return from WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585 to is_initialized in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:950
2024-04-23 11:38:48.130463 Return from is_initialized in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:950 to _get_default_group in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:976
2024-04-23 11:38:48.130482 Call to WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:583 from _get_default_group in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:981
2024-04-23 11:38:48.130500 Call to default_pg in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:453 from WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585
2024-04-23 11:38:48.130519 Return from default_pg in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:461 to WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585
2024-04-23 11:38:48.130536 Return from WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585 to _get_default_group in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:981
2024-04-23 11:38:48.130554 Return from _get_default_group in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:981 to get_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1532
2024-04-23 11:38:48.130573 Call to WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:583 from get_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1533
2024-04-23 11:38:48.130598 Call to default_pg in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:453 from WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585
2024-04-23 11:38:48.130617 Return from default_pg in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:461 to WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585
2024-04-23 11:38:48.130635 Return from WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585 to get_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1533
2024-04-23 11:38:48.130659 Call to get_group_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:762 from get_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1536
2024-04-23 11:38:48.130679 Call to WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:583 from get_group_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:777
2024-04-23 11:38:48.130699 Call to default_pg in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:453 from WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585
2024-04-23 11:38:48.130718 Return from default_pg in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:461 to WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585
2024-04-23 11:38:48.130737 Return from WORLD in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:585 to get_group_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:777
2024-04-23 11:38:48.130757 Call to pg_group_ranks in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:490 from get_group_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:779
2024-04-23 11:38:48.130780 Return from pg_group_ranks in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:498 to get_group_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:779
2024-04-23 11:38:48.130800 Call to pg_group_ranks in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:490 from get_group_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:781
2024-04-23 11:38:48.130817 Return from pg_group_ranks in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:498 to get_group_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:781
2024-04-23 11:38:48.130837 Return from get_group_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:785 to get_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1536
2024-04-23 11:38:48.130856 Return from get_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1536 to get_tensor_model_parallel_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/distributed/parallel_state.py:218
2024-04-23 11:38:48.130873 Return from get_tensor_model_parallel_rank in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/distributed/parallel_state.py:218 to tensor_model_parallel_gather in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/distributed/communication_op.py:99
2024-04-23 11:38:48.130898 Return from tensor_model_parallel_gather in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/distributed/communication_op.py:103 to _get_logits in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/layers/logits_processor.py:67
2024-04-23 11:38:48.130925 Return from _get_logits in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/layers/logits_processor.py:71 to forward in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/layers/logits_processor.py:51
2024-04-23 11:38:48.130947 Return from forward in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/layers/logits_processor.py:59 to _call_impl in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py:1520
2024-04-23 11:38:48.130970 Return from _call_impl in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py:1520 to _wrapped_call_impl in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py:1511
2024-04-23 11:38:48.130990 Return from _wrapped_call_impl in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py:1511 to compute_logits in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/llama.py:366
2024-04-23 11:38:48.131010 Return from compute_logits in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/llama.py:368 to execute_model in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py:851
2024-04-23 11:38:48.131035 Return from execute_model in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py:855 to decorate_context in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py:115
2024-04-23 11:38:48.131078 Call to __exit__ in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/autograd/grad_mode.py:271 from decorate_context in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py:114
2024-04-23 11:38:48.131106 Return from __exit__ in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/autograd/grad_mode.py:272 to decorate_context in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py:114
2024-04-23 11:38:48.131128 Return from decorate_context in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py:114 to execute_model in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/worker.py:249
2024-04-23 11:38:48.131157 Return from execute_model in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/worker.py:254 to decorate_context in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py:115
2024-04-23 11:38:48.131180 Call to __exit__ in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/autograd/grad_mode.py:271 from decorate_context in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py:114
2024-04-23 11:38:48.131203 Return from __exit__ in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/autograd/grad_mode.py:272 to decorate_context in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py:114
2024-04-23 11:38:48.131227 Return from decorate_context in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py:114 to execute_method in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/worker_base.py:145
2024-04-23 11:38:48.131249 Return from execute_method in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/worker_base.py:145 to _resume_span in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/util/tracing/tracing_helper.py:467
2024-04-23 11:38:48.131272 Return from _resume_span in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/util/tracing/tracing_helper.py:467 to actor_method_executor in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/function_manager.py:724
2024-04-23 11:38:48.131301 Return from actor_method_executor in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/function_manager.py:724 to main_loop in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:879
2024-04-23 11:38:48.131332 Call to record_task_log_end in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:620 from main_loop in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:879
2024-04-23 11:38:48.131354 Call to actor_id in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:499 from record_task_log_end in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:623
2024-04-23 11:38:48.131386 Return from actor_id in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:502 to record_task_log_end in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:623
2024-04-23 11:38:48.131406 Return from record_task_log_end in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:628 to main_loop in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:879
2024-04-23 11:38:48.131463 Call to get_serialization_context in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:660 from main_loop in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:879
2024-04-23 11:38:48.131484 Call to current_job_id in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:491 from get_serialization_context in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:670
2024-04-23 11:38:48.131505 Return from current_job_id in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:494 to get_serialization_context in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:670
2024-04-23 11:38:48.131530 Return from get_serialization_context in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:672 to main_loop in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:879
2024-04-23 11:38:48.131550 Call to serialize in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/serialization.py:482 from main_loop in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:879
2024-04-23 11:38:48.131573 Call to _serialize_to_msgpack in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/serialization.py:433 from serialize in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/serialization.py:494
2024-04-23 11:38:48.131614 Call to packb in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/msgpack/__init__.py:30 from _serialize_to_msgpack in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/serialization.py:468
2024-04-23 11:38:48.131645 Return from packb in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/msgpack/__init__.py:36 to _serialize_to_msgpack in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/serialization.py:468
2024-04-23 11:38:48.131678 Call to packb in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/msgpack/__init__.py:30 from _serialize_to_msgpack in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/serialization.py:478
2024-04-23 11:38:48.131701 Return from packb in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/msgpack/__init__.py:36 to _serialize_to_msgpack in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/serialization.py:478
2024-04-23 11:38:48.131721 Return from _serialize_to_msgpack in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/serialization.py:478 to serialize in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/serialization.py:494
2024-04-23 11:38:48.131747 Return from serialize in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/serialization.py:494 to main_loop in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:879
2024-04-23 11:38:48.131796 Call to _changeproctitle in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:2528 from __exit__ in /root/anaconda3/envs/vllm/lib/python3.10/contextlib.py:142
2024-04-23 11:38:48.131819 Call to _mode in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:3051 from _changeproctitle in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:2530
2024-04-23 11:38:48.131838 Return from _mode in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:3059 to _changeproctitle in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:2530
2024-04-23 11:38:48.131882 Return from _changeproctitle in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:2531 to __exit__ in /root/anaconda3/envs/vllm/lib/python3.10/contextlib.py:142
2024-04-23 11:38:48.131959 Call to disable_client_hook in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:67 from __exit__ in /root/anaconda3/envs/vllm/lib/python3.10/contextlib.py:142
2024-04-23 11:38:48.131985 Call to _set_client_hook_status in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:38 from disable_client_hook in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:69
2024-04-23 11:38:48.132008 Return from _set_client_hook_status in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:40 to disable_client_hook in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:69
2024-04-23 11:38:48.132026 Return from disable_client_hook in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:69 to __exit__ in /root/anaconda3/envs/vllm/lib/python3.10/contextlib.py:142
youkaichao commented 4 months ago

I think there is something wrong with your ray environment, but I'm not sure.

DefTruth commented 4 months ago

@youkaichao this hang has not been found at 0.4.0.post1 in my test

DefTruth commented 4 months ago

@youkaichao can you re-pro this? still can not exit.

del llm
(RayWorkerWrapper pid=2301191) INFO 04-28 17:56:00 model_runner.py:953] Graph capturing finished in 9 secs. [repeated 6x across cluster]
(RayWorkerWrapper pid=2301191) [W socket.cpp:697] [c10d] The client socket cannot be initialized to connect to [::ffff:10.189.108.254]:8949 (errno: 97 - Address family not supported by protocol). [repeated 6x across cluster]
# hang ....
DefTruth commented 4 months ago

@youkaichao after i del driver_worker.woker.model_runner manually, the process can exit normally! it seems the cuda graph captured by driver_worker.woker.model_runner can not be del automatic when the process exiting.

    del llm.llm_engine.model_executor.driver_worker.worker.model_runner

script to re-produce

def main(args):
    model_name = args.model
    llm = LLM(model="Qwen/Qwen1.5-72B-Chat",  # any model is OK.
              tokenizer_mode='slow',
              trust_remote_code=True,
              tensor_parallel_size=8,
              max_model_len=8192,
              swap_space=4,
              gpu_memory_utilization=0.9,
              disable_custom_all_reduce=True,
              enable_prefix_caching=True,
              enforce_eager=False)

    tokenizer = AutoTokenizer.from_pretrained(
        "Qwen/Qwen1.5-72B-Chat", use_fast=False, trust_remote_code=True)

    sampling_params = SamplingParams(best_of=1,
                                     frequency_penalty=0.0,
                                     temperature=0, 
                                     max_tokens=512,
                                     presence_penalty=1.0,
                                     top_p=1.0,
                                     skip_special_tokens=True,
                                     include_stop_str_in_output=False)
     request_output = llm.generate("你是谁?", 
                                                          sampling_params=sampling_params, 
                                                          use_tqdm=False)
    # del model_runner manually. without this, the process can not exit normally.
    del llm.llm_engine.model_executor.driver_worker.worker.model_runner 

I would be very grateful if you could take a look at this question. The __del__ method seems has not been call while cuda graph is enabled.

youkaichao commented 4 months ago

Will take a look in the next week. In addition, I think this might be related to how ray manages processes. cc @rkooo567 FYI.

DefTruth commented 4 months ago

@youkaichao many thanks~

rkooo567 commented 4 months ago

@youkaichao let me know if you need any assistance! The fact that del llm.llm_engine.model_executor.driver_worker.worker.model_runner fixes the issue make me wonder if it is more like nccl related issue (because driver worker is not running in ray worker)

DefTruth commented 4 months ago

@youkaichao @rkooo567 This error no longer exists in the latest vllm, can you guys tell me how to fix it? Somewhat interested.

2024-05-09 14:23:39.711582 Call to connected in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:473 from __del__ in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/actor.py:1348
2024-05-09 14:23:39.711599 Return from connected in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/_private/worker.py:476 to __del__ in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/actor.py:1348
2024-05-09 14:23:39.711616 Return from __del__ in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/ray/actor.py:1348 to  in :0
2024-05-09 14:23:39.711634 Call to __del__ in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py:1019 from  in :0
2024-05-09 14:23:39.769073 Return from __del__ in /root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py:1027 to  in :0
rkooo567 commented 4 months ago

Hmm I am not sure what's changed, but @youkaichao made several PRs to clean up tp > 1 cases. Maybe it was fixed by that...

youkaichao commented 4 months ago

Might be related with https://github.com/vllm-project/vllm/pull/4508#issuecomment-2087794774 .