[Installation]: vLLM build from source errors

Imss27 commented 1 day ago

Your current environment

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.26.2
Libc version: glibc-2.35

Python version: 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Is CUDA available: N/A
CUDA runtime version: 12.1.66
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4060 Laptop GPU
Nvidia driver version: 551.88
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      46 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             32
On-line CPU(s) list:                0-31
Vendor ID:                          GenuineIntel
Model name:                         13th Gen Intel(R) Core(TM) i9-13950HX
CPU family:                         6
Model:                              183
Thread(s) per core:                 2
Core(s) per socket:                 16
Socket(s):                          1
Stepping:                           1
BogoMIPS:                           4838.40
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
Virtualization:                     VT-x
Hypervisor vendor:                  Microsoft
Virtualization type:                full
L1d cache:                          768 KiB (16 instances)
L1i cache:                          512 KiB (16 instances)
L2 cache:                           32 MiB (16 instances)
L3 cache:                           36 MiB (1 instance)
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] No relevant packages
[conda] No relevant packages
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X                              N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

How you are installing vllm

pip install -e .

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Imss27 commented 1 day ago

Error Messages:

      -- The CXX compiler identification is GNU 11.4.0
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Build type: RelWithDebInfo
      -- Target device: cuda
      -- Found Python: /home/imss27/anaconda3/envs/vllm/bin/python (found version "3.10.14") found components: Interpreter Development.Module Development.SABIModule
      -- Found python matching: /home/imss27/anaconda3/envs/vllm/bin/python.
      -- Found CUDA: /usr/local/cuda-12.1 (found version "12.1")
      -- The CUDA compiler identification is NVIDIA 12.1.66
      -- Detecting CUDA compiler ABI info
      -- Detecting CUDA compiler ABI info - done
      -- Check for working CUDA compiler: /usr/local/cuda-12.1/bin/nvcc - skipped
      -- Detecting CUDA compile features
      -- Detecting CUDA compile features - done
      -- Found CUDAToolkit: /usr/local/cuda-12.1/include (found version "12.1.66")
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- Caffe2: CUDA detected: 12.1
      -- Caffe2: CUDA nvcc is: /usr/local/cuda-12.1/bin/nvcc
      -- Caffe2: CUDA toolkit directory: /usr/local/cuda-12.1
      -- Caffe2: Header version is: 12.1
      -- /usr/local/cuda-12.1/lib64/libnvrtc.so shorthash is d540eb83
      -- USE_CUDNN is set to 0. Compiling without cuDNN support
      -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
      -- Autodetected CUDA architecture(s):  8.9
      -- Added CUDA NVCC flags for: -gencode;arch=compute_89,code=sm_89
      CMake Warning at /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
        static library kineto_LIBRARY-NOTFOUND not found.
      Call Stack (most recent call first):
        /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:120 (append_torchlib_if_found)
        CMakeLists.txt:70 (find_package)

      -- Found Torch: /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/torch/lib/libtorch.so
      -- Enabling core extension.
      -- CUDA supported arches: 7.0;7.5;8.0;8.6;8.9;9.0
      -- CUDA target arches: 89-real
      -- CMake Version: 3.30.3
      -- CUTLASS 3.5.1
      -- CUDART: /usr/local/cuda-12.1/lib64/libcudart.so
      -- CUDA Driver: /usr/local/cuda-12.1/lib64/stubs/libcuda.so
      -- NVRTC: /usr/local/cuda-12.1/lib64/libnvrtc.so
      -- Default Install Location: install
      -- Found Python3: /home/imss27/anaconda3/envs/vllm/bin/python3.10 (found suitable version "3.10.14", minimum required is "3.5") found components: Interpreter
      -- Make cute::tuple be the new standard-layout tuple type
      -- CUDA Compilation Architectures: 70;72;75;80;86;87;89;90;90a
      -- Enable caching of reference results in conv unit tests
      -- Enable rigorous conv problem sizes in conv unit tests
      -- Using NVCC flags: --expt-relaxed-constexpr;-DCUTE_USE_PACKED_TUPLE=1;-DCUTLASS_TEST_LEVEL=0;-DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1;-DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1;-DCUTLASS_DEBUG_TRACE_LEVEL=0;-Xcompiler=-Wconversion;-Xcompiler=-fno-strict-aliasing;-lineinfo
      fatal: not a git repository (or any of the parent directories): .git
      -- CUTLASS Revision: Unable to detect, Git returned code 128.
      -- Configuring cublas ...
      -- cuBLAS Disabled.
      -- Configuring cuBLAS ... done.
      -- Machete generation completed successfully.
      -- Machete generated sources: /home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4b8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4b8_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4b8_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8b128.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8b128_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8b128_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u4.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u4_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u4_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u4b8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u4b8_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u4b8_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u8_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u8_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u8b128.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u8b128_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u8b128_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_bf16u4.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_bf16u4b8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_bf16u8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_bf16u8b128.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_f16u4.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_f16u4b8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_f16u8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_f16u8b128.cu
      -- Enabling C extension.
      -- Enabling moe extension.
      -- Configuring done (12.5s)
      CMake Warning at /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/cmake/data/share/cmake-3.30/Modules/FindPython/Support.cmake:4255 (add_library):
        Cannot generate a safe runtime search path for target _core_C because files
        in some directories may conflict with libraries in implicit directories:

          runtime library [libnvToolsExt.so.1] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
            /usr/local/cuda-12.1/lib64

        Some of these libraries may not be found correctly.
      Call Stack (most recent call first):
        /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/cmake/data/share/cmake-3.30/Modules/FindPython.cmake:691 (__Python_add_library)
        cmake/utils.cmake:327 (Python_add_library)
        CMakeLists.txt:94 (define_gpu_extension_target)

      -- Generating done (0.1s)
      -- Build files have been written to: /tmp/tmpf11xgj9c.build-temp
      Using MAX_JOBS=4 as the number of jobs.
      [1/68] Building CXX object CMakeFiles/_core_C.dir/csrc/core/torch_bindings.cpp.o
      [2/68] Linking CXX shared module /tmp/tmpnrpupvkc.build-lib/vllm/_core_C.abi3.so
      [3/68] Building CXX object CMakeFiles/_moe_C.dir/csrc/moe/torch_bindings.cpp.o
      [4/68] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o
      [5/68] Building CUDA object CMakeFiles/_C.dir/csrc/cache_kernels.cu.o
      [6/68] Building CUDA object CMakeFiles/_C.dir/csrc/pos_encoding_kernels.cu.o
      [7/68] Building CUDA object CMakeFiles/_C.dir/csrc/activation_kernels.cu.o
      [8/68] Building CUDA object CMakeFiles/_C.dir/csrc/layernorm_kernels.cu.o
      [9/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq/q_gemm.cu.o
      [10/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/compressed_tensors/int8_quant_kernels.cu.o
      [11/68] Building CUDA object CMakeFiles/_C.dir/csrc/cuda_utils_kernels.cu.o
      [12/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp8/common.cu.o
      /home/imss27/dev/vllm/csrc/quantization/fp8/common.cu:20:1: warning: ‘host’ attribute directive ignored [-Wattributes]
         20 | C10_HOST_DEVICE constexpr auto FP8_E4M3_MAX =
            | ^~~~~~~~~~~~
      [13/68] Building CUDA object CMakeFiles/_C.dir/csrc/moe_align_block_size_kernels.cu.o
      [14/68] Building CUDA object CMakeFiles/_C.dir/csrc/prepare_inputs/advance_step.cu.o
      /home/imss27/dev/vllm/csrc/prepare_inputs/advance_step.cu: In function ‘void prepare_inputs::advance_step_flashinfer(int, int, int, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&)’:
      /home/imss27/dev/vllm/csrc/prepare_inputs/advance_step.cu:214:8: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
        214 |     printf("  block_tables.stride(0) = %d\n", block_tables.stride(0));
            |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~~~~~~~~~~~
            |                                                              |
            |                                                              int64_t {aka long int}
      [15/68] Building CXX object CMakeFiles/_C.dir/csrc/torch_bindings.cpp.o
      [16/68] Building CUDA object CMakeFiles/_C.dir/csrc/mamba/causal_conv1d/causal_conv1d.cu.o
      [17/68] Building CUDA object CMakeFiles/_C.dir/csrc/mamba/mamba_ssm/selective_scan_fwd.cu.o
      [18/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/aqlm/gemm_kernels.cu.o
      [19/68] Building CUDA object CMakeFiles/_C.dir/csrc/attention/attention_kernels.cu.o
      [20/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/awq/gemm_kernels.cu.o
      [21/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/dense/marlin_cuda_kernel.cu.o
      [22/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/qqq/marlin_qqq_gemm_kernel.cu.o
      [23/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sparse/marlin_24_cuda_kernel.cu.o
      [24/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/awq_marlin_repack.cu.o
      [25/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/gptq_marlin_repack.cu.o
      [26/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gguf/gguf_kernel.cu.o
      [27/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp8/fp8_marlin.cu.o
      [28/68] Building CUDA object CMakeFiles/_C.dir/csrc/custom_all_reduce.cu.o
      [29/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_entry.cu.o
      [30/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o
      FAILED: CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o
      /usr/local/cuda-12.1/bin/nvcc -forward-unknown-to-host-compiler -DCUTLASS_ENABLE_DIRECT_CUDA_DRIVER_CALL=1 -DPy_LIMITED_API=3 -DTORCH_EXTENSION_NAME=_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_C_EXPORTS -I/home/imss27/dev/vllm/csrc -I/tmp/tmpf11xgj9c.build-temp/_deps/cutlass-src/include -isystem /home/imss27/anaconda3/envs/vllm/include/python3.10 -isystem /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/torch/include -isystem /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda-12.1/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_89,code=[sm_89]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8 --threads=1 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o -MF CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o.d -x cu -c /home/imss27/dev/vllm/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu -o CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o
      Killed
      [31/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c3x.cu.o
      /tmp/tmpf11xgj9c.build-temp/_deps/cutlass-src/include/cutlass/device_kernel.h: In function ‘void cutlass::device_kernel(typename Operator::Params) [with Operator = _GLOBAL__N__c31ef43b_16_scaled_mm_c3x_cu_22d95651::cutlass_3x_gemm<signed char, cutlass::bfloat16_t, _GLOBAL__N__c31ef43b_16_scaled_mm_c3x_cu_22d95651::ScaledEpilogueBias, cute::tuple<cute::C<64>, cute::C<64>, cute::C<256> >, cute::tuple<cute::C<1>, cute::C<8>, cute::C<1> >, cutlass::gemm::KernelTmaWarpSpecialized, cutlass::epilogue::TmaWarpSpecialized>::GemmKernel]’:
      /tmp/tmpf11xgj9c.build-temp/_deps/cutlass-src/include/cutlass/device_kernel.h:104:1: note: the ABI for passing parameters with 64-byte alignment has changed in GCC 4.6
        104 | void device_kernel(CUTLASS_GRID_CONSTANT typename Operator::Params const params)
            | ^~~~~~~~~~~~~
      [32/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/gptq_marlin.cu.o
      [33/68] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_ops.cu.o
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 138, in run
          self._create_wheel_file(bdist_wheel)
        File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 341, in _create_wheel_file
          files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
        File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 264, in _run_build_commands
          self._run_build_subcommands()
        File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 291, in _run_build_subcommands
          self.run_command(name)
        File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 950, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 98, in run
          _build_ext.run(self)
        File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
          self.build_extensions()
        File "<string>", line 241, in build_extensions
        File "/home/imss27/anaconda3/envs/vllm/lib/python3.10/subprocess.py", line 369, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['cmake', '--build', '.', '-j=4', '--target=_core_C', '--target=_moe_C', '--target=_C']' returned non-zero exit status 1.
      /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py:973: _DebuggingTips: Problem in editable installation.

youkaichao commented 1 day ago

  fatal: not a git repository (or any of the parent directories): .git
  -- CUTLASS Revision: Unable to detect, Git returned code 128.

looks like some cutlass error?

cc @tlrmchlsmth

youkaichao commented 1 day ago

  -- CUTLASS 3.5.1

the user actually already has cutlass. maybe this caused some conflict?

Imss27 commented 1 day ago

Thanks for providing some quick insignts here. Added some question, could it be due to these lines here in the CMakeLists.txt file?

set(VLLM_EXT_SRC
  "csrc/cache_kernels.cu"
  "csrc/attention/attention_kernels.cu"
  "csrc/pos_encoding_kernels.cu"
  "csrc/activation_kernels.cu"
  "csrc/layernorm_kernels.cu"
  "csrc/quantization/gptq/q_gemm.cu"
  "csrc/quantization/compressed_tensors/int8_quant_kernels.cu"
  "csrc/quantization/fp8/common.cu"
  "csrc/cuda_utils_kernels.cu"
  "csrc/moe_align_block_size_kernels.cu"
  "csrc/prepare_inputs/advance_step.cu"
  "csrc/torch_bindings.cpp")

if(VLLM_GPU_LANG STREQUAL "CUDA")
  include(FetchContent)
  SET(CUTLASS_ENABLE_HEADERS_ONLY ON CACHE BOOL "Enable only the header library")
  FetchContent_Declare(
        cutlass
        GIT_REPOSITORY https://github.com/nvidia/cutlass.git
        GIT_TAG v3.5.1
        GIT_PROGRESS TRUE

        # Speed up CUTLASS download by retrieving only the specified GIT_TAG instead of the history.
        # Important: If GIT_SHALLOW is enabled then GIT_TAG works only with branch names and tags.
        # So if the GIT_TAG above is updated to a commit hash, GIT_SHALLOW must be set to FALSE
        GIT_SHALLOW TRUE
  )
  FetchContent_MakeAvailable(cutlass)

tlrmchlsmth commented 1 day ago

I haven’t seen this before, but looks like something might be going wrong with FetchContent. @Imss27 is your machine connected to the internet? it needs to clone cutlass during the build process. Also try commenting out the GIT_SHALLOW call just in case it is causing problems.

@youkaichao I don’t think cutlass already being there is the thing that’s causing this issue

Imss27 commented 1 day ago

@tlrmchlsmth Thank you. Yes my computer is connecting to the internet. I tried to ping and curl as following. Could this be a possible reason? 1. $ ping https://github.com/nvidia/cutlass.git Returns ping: https://github.com/nvidia/cutlass.git: Name or service not known 2. $ curl https://github.com/nvidia/cutlass.git Returns

<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx</center>
</body>
</html>

zifeitong commented 1 day ago

Can you try and rule out compiler out-of-memory? Either check dmesg for logs or set MAX_JOBS to smaller number.

tlrmchlsmth commented 1 day ago

@tlrmchlsmth Thank you. Yes my computer is connecting to the internet. I tried to ping and curl as following. Could this be a possible reason? 1. $ ping https://github.com/nvidia/cutlass.git Returns ping: https://github.com/nvidia/cutlass.git: Name or service not known 2. $ curl https://github.com/nvidia/cutlass.git Returns
<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx</center>
</body>
</html>
It's a git repo (https://github.com/NVIDIA/cutlass) . You should be able to git clone https://github.com/NVIDIA/cutlass, I don't think it would make sense to ping or curl it

tlrmchlsmth commented 1 day ago

Still not sure what's going on here though. Does deleting the following help? (We've run into trouble with it previously, hence the long comment)

        # Speed up CUTLASS download by retrieving only the specified GIT_TAG instead of the history.
        # Important: If GIT_SHALLOW is enabled then GIT_TAG works only with branch names and tags.
        # So if the GIT_TAG above is updated to a commit hash, GIT_SHALLOW must be set to FALSE
        GIT_SHALLOW TRUE

youkaichao commented 1 day ago

Python platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35

might be caused by WSL?

Imss27 commented 1 day ago

Thank you all for the great suggestions and insights! @youkaichao @tlrmchlsmth @zifeitong

Previously I tried to set MAX_JOBS to small values like 4, 6 as mentioned in vLLM installation docs. But it produced identical errors.

For future reference, using WSL, if encountering similar issues, try conservative approaches like

$ export MAX_JOBS=1
$ pip install -e .

(Note this will take an extremely long time to build)

This solved the build issue that I encountered even still with CUTLASS git error:

fatal: not a git repository (or any of the parent directories): .git
-- CUTLASS Revision: Unable to detect, Git returned code 128.

vllm-project / vllm