vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
29.79k stars 4.5k forks source link

[Bug]:Segmentation fault encountered while running model #4734

Closed terrifyzhao closed 6 months ago

terrifyzhao commented 6 months ago

Your current environment

PyTorch version: 2.1.2+cu118 Is debug build: False CUDA used to build PyTorch: 11.8 ROCM used to build PyTorch: N/A

OS: Tencent tlinux 2.2 (Final) (x86_64) GCC version: (GCC) 7.3.0 Clang version: Could not collect CMake version: version 3.29.2 Libc version: glibc-2.17

Python version: 3.9.0 | packaged by conda-forge | (default, Nov 26 2020, 07:57:39) [GCC 9.3.0] (64-bit runtime) Python platform: Linux-4.14.105-1-tlinux3-0013-x86_64-with-glibc2.17 Is CUDA available: True CUDA runtime version: 11.8.89 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: A100-SXM4-40GB GPU 1: A100-SXM4-40GB GPU 2: A100-SXM4-40GB GPU 3: A100-SXM4-40GB GPU 4: A100-SXM4-40GB GPU 5: A100-SXM4-40GB GPU 6: A100-SXM4-40GB GPU 7: A100-SXM4-40GB

Nvidia driver version: 450.156.00 cuDNN version: Probably one of the following: /usr/lib64/libcudnn.so.8.0.5 /usr/lib64/libcudnn_adv_infer.so.8.0.5 /usr/lib64/libcudnn_adv_train.so.8.0.5 /usr/lib64/libcudnn_cnn_infer.so.8.0.5 /usr/lib64/libcudnn_cnn_train.so.8.0.5 /usr/lib64/libcudnn_ops_infer.so.8.0.5 /usr/lib64/libcudnn_ops_train.so.8.0.5 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 192 On-line CPU(s) list: 0-191 Thread(s) per core: 2 Core(s) per socket: 48 Socket(s): 2 NUMA node(s): 2 Vendor ID: AuthenticAMD CPU family: 23 Model: 49 Model name: AMD EPYC 7K62 48-Core Processor Stepping: 0 CPU MHz: 3288.638 CPU max MHz: 2600.0000 CPU min MHz: 1500.0000 BogoMIPS: 5190.05 Virtualization: AMD-V L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 16384K NUMA node0 CPU(s): 0-47,96-143 NUMA node1 CPU(s): 48-95,144-191 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca

Versions of relevant libraries: [pip3] numpy==1.26.4 [pip3] nvidia-nccl-cu11==2.19.3 [pip3] torch==2.1.2+cu118 [pip3] triton==2.1.0 [pip3] vllm_nccl_cu11==2.18.1.0.4.0 [conda] numpy 1.26.4 pypi_0 pypi [conda] nvidia-nccl-cu11 2.19.3 pypi_0 pypi [conda] torch 2.1.2+cu118 pypi_0 pypi [conda] triton 2.1.0 pypi_0 pypi [conda] vllm-nccl-cu11 2.18.1.0.4.0 pypi_0 pypiROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: 0.4.0.post1 vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 mlx5_0 mlx5_1 mlx5_2 mlx5_3 mlx5_4 mlx5_5 mlx5_6 mlx5_7 mlx5_8 mlx5_9 mlx5_10 mlx5_11 mlx5_12 mlx5_13 mlx5_14 mlx5_15 mlx5_16 mlx5_17 mlx5_18 mlx5_19 mlx5_20 mlx5_21 mlx5_22 mlx5_23 mlx5_24 mlx5_25 CPU Affinity NUMA Affinity GPU0 X NV12 NV12 NV12 NV12 NV12 NV12 NV12 NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODENODE NODE NODE NODE NODE NODE NODE 0-47,96-143 0 GPU1 NV12 X NV12 NV12 NV12 NV12 NV12 NV12 NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODENODE NODE NODE NODE NODE NODE NODE 0-47,96-143 0 GPU2 NV12 NV12 X NV12 NV12 NV12 NV12 NV12 PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB 0-47,96-143 0 GPU3 NV12 NV12 NV12 X NV12 NV12 NV12 NV12 PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB PXB 0-47,96-143 0 GPU4 NV12 NV12 NV12 NV12 X NV12 NV12 NV12 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS 48-95,144-191 1 GPU5 NV12 NV12 NV12 NV12 NV12 X NV12 NV12 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS 48-95,144-191 1 GPU6 NV12 NV12 NV12 NV12 NV12 NV12 X NV12 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS 48-95,144-191 1 GPU7 NV12 NV12 NV12 NV12 NV12 NV12 NV12 X SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS 48-95,144-191 1 mlx5_0 NODE NODE PXB PXB SYS SYS SYS SYS X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_1 NODE NODE PXB PXB SYS SYS SYS SYS PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_2 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_3 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_4 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_5 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_6 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_7 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_8 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_9 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_10 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_11 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_12 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_13 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_14 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_15 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_16 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX mlx5_17 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX mlx5_18 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX mlx5_19 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX mlx5_20 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX mlx5_21 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX mlx5_22 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX mlx5_23 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX mlx5_24 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX mlx5_25 NODE NODE PXB PXB SYS SYS SYS SYS PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X

Legend:

X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

this is my code

llm = LLM(ptm_path('qwen-1_5-0_5B-chat'))
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

while 1:
    prompts = input('text:')
    outputs = llm.generate(prompts, sampling_params)
    # Print the outputs.
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

this is log I don't sea any errors in log

INFO 05-10 15:46:32 llm_engine.py:74] Initializing an LLM engine (v0.4.0.post1) with config: model='/apdcephfs_nj7/share_3273289/ptm/Qwen1.5-7B-Chat', tokenizer='/apdcephfs_nj7/share_3273289/ptm/Qwen1.5-7B-Chat', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/root/miniforge3/envs/vllm/lib/python3.9/site-packages/vllm/executor/gpu_executor.py:51: UserWarning: Failed to get the IP address, using 0.0.0.0 by default.The value can be set by the environment variable HOST_IP.
  get_ip(), get_open_port())
INFO 05-10 15:46:33 selector.py:51] Cannot use FlashAttention because the package is not found. Please install it for better performance.
INFO 05-10 15:46:33 selector.py:25] Using XFormers backend.
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154179 [0] NCCL INFO Bootstrap : Using eth1:11.239.99.209<0>
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154179 [0] NCCL INFO NET/Plugin : Plugin load (libnccl-net.so) returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154179 [0] NCCL INFO NET/Plugin : No plugin found, using internal implementation
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154179 [0] NCCL INFO cudaDriverVersion 11000
NCCL version 2.18.6+cuda11.8
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO NET/IB : No device found.
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO NET/Socket : Using [0]eth1:11.239.99.209<0>
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Using network Socket
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO comm 0x55b203d858d0 rank 0 nranks 1 cudaDev 0 nvmlDev 0 busId e000 commId 0xbe6d6f72f2040b52 - Init START
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Setting affinity for GPU 0 to ffff,ffffffff,00000000,0000ffff,ffffffff
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 00/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 01/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 02/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 03/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 04/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 05/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 06/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 07/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 08/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 09/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 10/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 11/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 12/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 13/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 14/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 15/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 16/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 17/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 18/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 19/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 20/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 21/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 22/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 23/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 24/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 25/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 26/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 27/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 28/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 29/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 30/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Channel 31/32 :    0
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO P2P Chunksize set to 131072
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Connected all rings
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO Connected all trees
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO 32 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer
fbaf814b-325b-4fcf-b381-dd6b08ef3f8c:154179:154317 [0] NCCL INFO comm 0x55b203d858d0 rank 0 nranks 1 cudaDev 0 nvmlDev 0 busId e000 commId 0xbe6d6f72f2040b52 - Init COMPLETE
INFO 05-10 15:46:37 model_runner.py:104] Loading model weights took 14.3919 GB
INFO 05-10 15:46:40 gpu_executor.py:94] # GPU blocks: 2173, # CPU blocks: 512
INFO 05-10 15:46:41 model_runner.py:791] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 05-10 15:46:41 model_runner.py:795] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
Segmentation fault
terrifyzhao commented 6 months ago

the problem has been resolved update cuda to 12.1