vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
29.42k stars 4.41k forks source link

[Installation]: FAILED: CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_bf16u4_impl_part0.cu.o #8889

Closed wangshuai09 closed 1 week ago

wangshuai09 commented 1 month ago

Your current environment

The output of `python collect_env.py`
Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.2 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 14.0.0-1ubuntu1.1
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.15.0-73-generic-x86_64-with-glibc2.35
Is CUDA available: N/A
CUDA runtime version: 12.6.68
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: 
GPU 0: Tesla T4
GPU 1: Tesla T4

Nvidia driver version: 560.35.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.4.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.4.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   46 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          16
On-line CPU(s) list:             0-15
Vendor ID:                       GenuineIntel
Model name:                      Intel(R) Xeon(R) Gold 6151 CPU @ 3.00GHz
CPU family:                      6
Model:                           85
Thread(s) per core:              2
Core(s) per socket:              8
Socket(s):                       1
Stepping:                        4
BogoMIPS:                        6000.00
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat md_clear flush_l1d arch_capabilities
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       256 KiB (8 instances)
L1i cache:                       256 KiB (8 instances)
L2 cache:                        8 MiB (8 instances)
L3 cache:                        24.8 MiB (1 instance)
NUMA node(s):                    1
NUMA node0 CPU(s):               0-15
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:          Vulnerable
Vulnerability Mmio stale data:   Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Retbleed:          Vulnerable
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT Host state unknown

Versions of relevant libraries:
[pip3] No relevant packages
[conda] No relevant packages
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    GPU1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X  PHB 0-15    0       N/A
GPU1    PHB  X  0-15    0       N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

How you are installing vllm

pip install -e .
  Building editable for vllm (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building editable for vllm (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [366 lines of output]
      /tmp/pip-build-env-2wco638h/overlay/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:258: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
        cpu = _conversion_method_template(device=torch.device("cpu"))
      running editable_wheel
      creating /tmp/pip-wheel-e84mg27r/.tmp-rt8x061a/vllm.egg-info
      writing /tmp/pip-wheel-e84mg27r/.tmp-rt8x061a/vllm.egg-info/PKG-INFO
      writing dependency_links to /tmp/pip-wheel-e84mg27r/.tmp-rt8x061a/vllm.egg-info/dependency_links.txt
      writing entry points to /tmp/pip-wheel-e84mg27r/.tmp-rt8x061a/vllm.egg-info/entry_points.txt
      writing requirements to /tmp/pip-wheel-e84mg27r/.tmp-rt8x061a/vllm.egg-info/requires.txt
      writing top-level names to /tmp/pip-wheel-e84mg27r/.tmp-rt8x061a/vllm.egg-info/top_level.txt
      writing manifest file '/tmp/pip-wheel-e84mg27r/.tmp-rt8x061a/vllm.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      adding license file 'LICENSE'
      writing manifest file '/tmp/pip-wheel-e84mg27r/.tmp-rt8x061a/vllm.egg-info/SOURCES.txt'
      creating '/tmp/pip-wheel-e84mg27r/.tmp-rt8x061a/vllm-0.1.dev2794+g7193774.cu126.dist-info'
      creating /tmp/pip-wheel-e84mg27r/.tmp-rt8x061a/vllm-0.1.dev2794+g7193774.cu126.dist-info/WHEEL
      running build_py
      running build_ext
      -- The CXX compiler identification is GNU 11.4.0
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Build type: Debug
      -- Target device: cuda
      -- Found Python: /usr/local/anaconda3/envs/vllm/bin/python (found version "3.10.0") found components: Interpreter Development.Module Development.SABIModule
      -- Found python matching: /usr/local/anaconda3/envs/vllm/bin/python.
      -- Found CUDA: /usr/local/cuda (found version "12.6")
      -- The CUDA compiler identification is NVIDIA 12.6.68
      -- Detecting CUDA compiler ABI info
      -- Detecting CUDA compiler ABI info - done
      -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
      -- Detecting CUDA compile features
      -- Detecting CUDA compile features - done
      -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.6.68")
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- Caffe2: CUDA detected: 12.6
      -- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
      -- Caffe2: CUDA toolkit directory: /usr/local/cuda
      -- Caffe2: Header version is: 12.6
      -- /usr/local/cuda/lib64/libnvrtc.so shorthash is 136e7fe9
      -- USE_CUDNN is set to 0. Compiling without cuDNN support
      -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
      -- Autodetected CUDA architecture(s):  7.5 7.5
      -- Added CUDA NVCC flags for: -gencode;arch=compute_75,code=sm_75
      CMake Warning at /tmp/pip-build-env-2wco638h/overlay/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
        static library kineto_LIBRARY-NOTFOUND not found.
      Call Stack (most recent call first):
        /tmp/pip-build-env-2wco638h/overlay/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:120 (append_torchlib_if_found)
        CMakeLists.txt:84 (find_package)

      -- Found Torch: /tmp/pip-build-env-2wco638h/overlay/lib/python3.10/site-packages/torch/lib/libtorch.so
      -- Enabling core extension.
      -- CUDA supported arches: 7.0;7.5;8.0;8.6;8.9;9.0
      -- CUDA target arches: 75-real
      -- CMake Version: 3.30.3
      -- CUTLASS 3.5.1
      -- CUDART: /usr/local/cuda/lib64/libcudart.so
      -- CUDA Driver: /usr/local/cuda/lib64/stubs/libcuda.so
      -- NVRTC: /usr/local/cuda/lib64/libnvrtc.so
      -- Default Install Location: install
      -- Found Python3: /usr/local/anaconda3/envs/vllm/bin/python3.10 (found suitable version "3.10.0", minimum required is "3.5") found components: Interpreter
      -- Make cute::tuple be the new standard-layout tuple type
      -- CUDA Compilation Architectures: 70;72;75;80;86;87;89;90;90a
      -- Enable caching of reference results in conv unit tests
      -- Enable rigorous conv problem sizes in conv unit tests
      -- Using NVCC flags: --expt-relaxed-constexpr;-DCUTE_USE_PACKED_TUPLE=1;-DCUTLASS_TEST_LEVEL=0;-DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1;-DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1;-DCUTLASS_DEBUG_TRACE_LEVEL=0;-Xcompiler=-Wconversion;-Xcompiler=-fno-strict-aliasing;-lineinfo
      -- Configuring cublas ...
      -- cuBLAS Disabled.
      -- Configuring cuBLAS ... done.
      -- Machete generation completed successfully.
      -- Machete generated sources: /home/ws/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4_impl_part0.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4_impl_part1.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4b8.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4b8_impl_part0.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4b8_impl_part1.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8_impl_part0.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8_impl_part1.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8b128.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8b128_impl_part0.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8b128_impl_part1.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_f16u4.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_f16u4_impl_part0.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_f16u4_impl_part1.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_f16u4b8.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_f16u4b8_impl_part0.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_f16u4b8_impl_part1.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_f16u8.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_f16u8_impl_part0.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_f16u8_impl_part1.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_f16u8b128.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_f16u8b128_impl_part0.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_mm_f16u8b128_impl_part1.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_prepack_bf16u4.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_prepack_bf16u4b8.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_prepack_bf16u8.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_prepack_bf16u8b128.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_prepack_f16u4.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_prepack_f16u4b8.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_prepack_f16u8.cu;/home/ws/vllm/csrc/quantization/machete/generated/machete_prepack_f16u8b128.cu
      -- Enabling C extension.
      -- Enabling moe extension.
      -- Build type: Debug
      -- Target device: cuda
      -- Building vllm-flash-attn inside vLLM. Skipping flag detection and relying on parent build.
      -- vllm-flash-attn is available at /tmp/tmp871jv6sq.build-temp/_deps/vllm-flash-attn-src
      -- Configuring done (153.7s)
      -- Generating done (0.1s)
      -- Build files have been written to: /tmp/tmp871jv6sq.build-temp
      [1/137] Building CXX object CMakeFiles/_core_C.dir/csrc/core/torch_bindings.cpp.o
      [2/137] Linking CXX shared module _core_C.abi3.so
      [3/137] Building CUDA object _deps/vllm-flash-attn-build/CMakeFiles/vllm_flash_attn_c.dir/csrc/flash_attn/src/flash_fwd_hdim192_bf16_causal_sm80.cu.o
      [4/137] Building CUDA object _deps/vllm-flash-attn-build/CMakeFiles/vllm_flash_attn_c.dir/csrc/flash_attn/src/flash_fwd_hdim128_fp16_causal_sm80.cu.o
      [5/137] Building CUDA object _deps/vllm-flash-attn-build/CMakeFiles/vllm_flash_attn_c.dir/csrc/flash_attn/src/flash_fwd_hdim128_bf16_causal_sm80.cu.o
      [6/137] Building CUDA object _deps/vllm-flash-attn-build/CMakeFiles/vllm_flash_attn_c.dir/csrc/flash_attn/src/flash_fwd_hdim160_fp16_causal_sm80.cu.o
      [7/137] Building CUDA object _deps/vllm-flash-attn-build/CMakeFiles/vllm_flash_attn_c.dir/csrc/flash_attn/src/flash_fwd_hdim160_fp16_sm80.cu.o
      [8/137] Building CUDA object _deps/vllm-flash-attn-build/CMakeFiles/vllm_flash_attn_c.dir/csrc/flash_attn/src/flash_fwd_hdim160_bf16_sm80.cu.o
      [9/137] Building CUDA object _deps/vllm-flash-attn-build/CMakeFiles/vllm_flash_attn_c.dir/csrc/flash_attn/src/flash_fwd_hdim128_bf16_sm80.cu.o
      [10/137] Building CUDA object _deps/vllm-flash-attn-build/CMakeFiles/vllm_flash_attn_c.dir/csrc/flash_attn/src/flash_fwd_hdim128_fp16_sm80.cu.o
      [11/137] Building CUDA object _deps/vllm-flash-attn-build/CMakeFiles/vllm_flash_attn_c.dir/csrc/flash_attn/src/flash_fwd_hdim160_bf16_causal_sm80.cu.o
      [12/137] Building CXX object _deps/vllm-flash-attn-build/CMakeFiles/vllm_flash_attn_c.dir/csrc/flash_attn/flash_api.cpp.o
      [13/137] Building CUDA object CMakeFiles/_C.dir/csrc/cuda_utils_kernels.cu.o
      [14/137] Building CXX object CMakeFiles/_moe_C.dir/csrc/moe/torch_bindings.cpp.o
      [15/137] Building CXX object CMakeFiles/_C.dir/csrc/torch_bindings.cpp.o
      [16/137] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_kernels/marlin_moe_kernel_ku8b128.cu.o
      [17/137] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_ops.cu.o
      [18/137] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_kernels/marlin_moe_kernel_ku4b8.cu.o
      [19/137] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o
      [20/137] Building CUDA object CMakeFiles/_C.dir/csrc/cache_kernels.cu.o
      [21/137] Building CUDA object CMakeFiles/_C.dir/csrc/pos_encoding_kernels.cu.o
      [22/137] Building CUDA object CMakeFiles/_C.dir/csrc/activation_kernels.cu.o
      [23/137] Building CUDA object CMakeFiles/_C.dir/csrc/prepare_inputs/advance_step.cu.o
      /home/ws/vllm/csrc/prepare_inputs/advance_step.cu: In function ‘void prepare_inputs::advance_step_flashinfer(int, int, int, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&)’:
      /home/ws/vllm/csrc/prepare_inputs/advance_step.cu:214:8: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
        214 |     printf("  block_tables.stride(0) = %d\n", block_tables.stride(0));
            |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~~~~~~~~~~~
            |                                                              |
            |                                                              int64_t {aka long int}
      [24/137] Building CUDA object CMakeFiles/_C.dir/csrc/moe_align_block_size_kernels.cu.o
      [25/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/compressed_tensors/int8_quant_kernels.cu.o
      [26/137] Building CUDA object CMakeFiles/_C.dir/csrc/layernorm_kernels.cu.o
      [27/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp8/common.cu.o
      [28/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq/q_gemm.cu.o
      [29/137] Building CUDA object CMakeFiles/_C.dir/csrc/mamba/mamba_ssm/selective_scan_fwd.cu.o
      [30/137] Building CUDA object CMakeFiles/_C.dir/csrc/mamba/causal_conv1d/causal_conv1d.cu.o
      [31/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/aqlm/gemm_kernels.cu.o
      [32/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/awq/gemm_kernels.cu.o
      [33/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/dense/marlin_cuda_kernel.cu.o
      [34/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sparse/marlin_24_cuda_kernel.cu.o
      [35/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/qqq/marlin_qqq_gemm_kernel.cu.o
      [36/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/gptq_marlin_repack.cu.o
      [37/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/awq_marlin_repack.cu.o
      [38/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp8/fp8_marlin.cu.o
      [39/137] Building CUDA object CMakeFiles/_C.dir/csrc/custom_all_reduce.cu.o
      [40/137] Building CUDA object CMakeFiles/_C.dir/csrc/permute_cols.cu.o
      [41/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/gptq_marlin.cu.o
      [42/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gguf/gguf_kernel.cu.o
      [43/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_entry.cu.o
      [44/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_bf16u4b8.cu.o
      [45/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_bf16u4.cu.o
      [46/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_bf16u8.cu.o
      [47/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_bf16u8b128.cu.o
      [48/137] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_bf16u4_impl_part0.cu.o
      FAILED: CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_bf16u4_impl_part0.cu.o
      /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DCUTLASS_ENABLE_DIRECT_CUDA_DRIVER_CALL=1 -DPy_LIMITED_API=3 -DTORCH_EXTENSION_NAME=_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_C_EXPORTS -I/home/ws/vllm/csrc -I/tmp/tmp871jv6sq.build-temp/_deps/cutlass-src/include -isystem /usr/local/anaconda3/envs/vllm/include/python3.10 -isystem /tmp/pip-build-env-2wco638h/overlay/lib/python3.10/site-packages/torch/include -isystem /tmp/pip-build-env-2wco638h/overlay/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -g -std=c++17 "--generate-code=arch=compute_75,code=[sm_75]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8 --threads=1 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_90a,code=sm_90a -MD -MT CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_bf16u4_impl_part0.cu.o -MF CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_bf16u4_impl_part0.cu.o.d -x cu -c /home/ws/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4_impl_part0.cu -o CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_bf16u4_impl_part0.cu.o
      Killed

Before submitting a new issue...

Imss27 commented 1 month ago

Could you please try export MAX_JOBS=6 or set it to even smaller values? #8532

wangshuai09 commented 1 month ago

Could you please try export MAX_JOBS=6 or set it to even smaller values? #8532

Thanks, this works for me.