vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.94k stars 3.95k forks source link

[Installation]: vLLM build from source errors #8532

Closed Imss27 closed 1 day ago

Imss27 commented 1 day ago

Your current environment

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.26.2
Libc version: glibc-2.35

Python version: 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Is CUDA available: N/A
CUDA runtime version: 12.1.66
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4060 Laptop GPU
Nvidia driver version: 551.88
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      46 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             32
On-line CPU(s) list:                0-31
Vendor ID:                          GenuineIntel
Model name:                         13th Gen Intel(R) Core(TM) i9-13950HX
CPU family:                         6
Model:                              183
Thread(s) per core:                 2
Core(s) per socket:                 16
Socket(s):                          1
Stepping:                           1
BogoMIPS:                           4838.40
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
Virtualization:                     VT-x
Hypervisor vendor:                  Microsoft
Virtualization type:                full
L1d cache:                          768 KiB (16 instances)
L1i cache:                          512 KiB (16 instances)
L2 cache:                           32 MiB (16 instances)
L3 cache:                           36 MiB (1 instance)
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] No relevant packages
[conda] No relevant packages
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X                              N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

How you are installing vllm

pip install -e .

Before submitting a new issue...

Imss27 commented 1 day ago

Error Messages:

      -- The CXX compiler identification is GNU 11.4.0
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Build type: RelWithDebInfo
      -- Target device: cuda
      -- Found Python: /home/imss27/anaconda3/envs/vllm/bin/python (found version "3.10.14") found components: Interpreter Development.Module Development.SABIModule
      -- Found python matching: /home/imss27/anaconda3/envs/vllm/bin/python.
      -- Found CUDA: /usr/local/cuda-12.1 (found version "12.1")
      -- The CUDA compiler identification is NVIDIA 12.1.66
      -- Detecting CUDA compiler ABI info
      -- Detecting CUDA compiler ABI info - done
      -- Check for working CUDA compiler: /usr/local/cuda-12.1/bin/nvcc - skipped
      -- Detecting CUDA compile features
      -- Detecting CUDA compile features - done
      -- Found CUDAToolkit: /usr/local/cuda-12.1/include (found version "12.1.66")
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- Caffe2: CUDA detected: 12.1
      -- Caffe2: CUDA nvcc is: /usr/local/cuda-12.1/bin/nvcc
      -- Caffe2: CUDA toolkit directory: /usr/local/cuda-12.1
      -- Caffe2: Header version is: 12.1
      -- /usr/local/cuda-12.1/lib64/libnvrtc.so shorthash is d540eb83
      -- USE_CUDNN is set to 0. Compiling without cuDNN support
      -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
      -- Autodetected CUDA architecture(s):  8.9
      -- Added CUDA NVCC flags for: -gencode;arch=compute_89,code=sm_89
      CMake Warning at /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
        static library kineto_LIBRARY-NOTFOUND not found.
      Call Stack (most recent call first):
        /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:120 (append_torchlib_if_found)
        CMakeLists.txt:70 (find_package)

      -- Found Torch: /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/torch/lib/libtorch.so
      -- Enabling core extension.
      -- CUDA supported arches: 7.0;7.5;8.0;8.6;8.9;9.0
      -- CUDA target arches: 89-real
      -- CMake Version: 3.30.3
      -- CUTLASS 3.5.1
      -- CUDART: /usr/local/cuda-12.1/lib64/libcudart.so
      -- CUDA Driver: /usr/local/cuda-12.1/lib64/stubs/libcuda.so
      -- NVRTC: /usr/local/cuda-12.1/lib64/libnvrtc.so
      -- Default Install Location: install
      -- Found Python3: /home/imss27/anaconda3/envs/vllm/bin/python3.10 (found suitable version "3.10.14", minimum required is "3.5") found components: Interpreter
      -- Make cute::tuple be the new standard-layout tuple type
      -- CUDA Compilation Architectures: 70;72;75;80;86;87;89;90;90a
      -- Enable caching of reference results in conv unit tests
      -- Enable rigorous conv problem sizes in conv unit tests
      -- Using NVCC flags: --expt-relaxed-constexpr;-DCUTE_USE_PACKED_TUPLE=1;-DCUTLASS_TEST_LEVEL=0;-DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1;-DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1;-DCUTLASS_DEBUG_TRACE_LEVEL=0;-Xcompiler=-Wconversion;-Xcompiler=-fno-strict-aliasing;-lineinfo
      fatal: not a git repository (or any of the parent directories): .git
      -- CUTLASS Revision: Unable to detect, Git returned code 128.
      -- Configuring cublas ...
      -- cuBLAS Disabled.
      -- Configuring cuBLAS ... done.
      -- Machete generation completed successfully.
      -- Machete generated sources: /home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4b8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4b8_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4b8_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8b128.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8b128_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8b128_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u4.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u4_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u4_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u4b8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u4b8_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u4b8_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u8_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u8_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u8b128.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u8b128_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u8b128_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_bf16u4.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_bf16u4b8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_bf16u8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_bf16u8b128.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_f16u4.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_f16u4b8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_f16u8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_f16u8b128.cu
      -- Enabling C extension.
      -- Enabling moe extension.
      -- Configuring done (12.5s)
      CMake Warning at /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/cmake/data/share/cmake-3.30/Modules/FindPython/Support.cmake:4255 (add_library):
        Cannot generate a safe runtime search path for target _core_C because files
        in some directories may conflict with libraries in implicit directories:

          runtime library [libnvToolsExt.so.1] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
            /usr/local/cuda-12.1/lib64

        Some of these libraries may not be found correctly.
      Call Stack (most recent call first):
        /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/cmake/data/share/cmake-3.30/Modules/FindPython.cmake:691 (__Python_add_library)
        cmake/utils.cmake:327 (Python_add_library)
        CMakeLists.txt:94 (define_gpu_extension_target)

      -- Generating done (0.1s)
      -- Build files have been written to: /tmp/tmpf11xgj9c.build-temp
      Using MAX_JOBS=4 as the number of jobs.
      [1/68] Building CXX object CMakeFiles/_core_C.dir/csrc/core/torch_bindings.cpp.o
      [2/68] Linking CXX shared module /tmp/tmpnrpupvkc.build-lib/vllm/_core_C.abi3.so
      [3/68] Building CXX object CMakeFiles/_moe_C.dir/csrc/moe/torch_bindings.cpp.o
      [4/68] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o
      [5/68] Building CUDA object CMakeFiles/_C.dir/csrc/cache_kernels.cu.o
      [6/68] Building CUDA object CMakeFiles/_C.dir/csrc/pos_encoding_kernels.cu.o
      [7/68] Building CUDA object CMakeFiles/_C.dir/csrc/activation_kernels.cu.o
      [8/68] Building CUDA object CMakeFiles/_C.dir/csrc/layernorm_kernels.cu.o
      [9/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq/q_gemm.cu.o
      [10/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/compressed_tensors/int8_quant_kernels.cu.o
      [11/68] Building CUDA object CMakeFiles/_C.dir/csrc/cuda_utils_kernels.cu.o
      [12/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp8/common.cu.o
      /home/imss27/dev/vllm/csrc/quantization/fp8/common.cu:20:1: warning: ‘host’ attribute directive ignored [-Wattributes]
         20 | C10_HOST_DEVICE constexpr auto FP8_E4M3_MAX =
            | ^~~~~~~~~~~~
      [13/68] Building CUDA object CMakeFiles/_C.dir/csrc/moe_align_block_size_kernels.cu.o
      [14/68] Building CUDA object CMakeFiles/_C.dir/csrc/prepare_inputs/advance_step.cu.o
      /home/imss27/dev/vllm/csrc/prepare_inputs/advance_step.cu: In function ‘void prepare_inputs::advance_step_flashinfer(int, int, int, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&)’:
      /home/imss27/dev/vllm/csrc/prepare_inputs/advance_step.cu:214:8: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
        214 |     printf("  block_tables.stride(0) = %d\n", block_tables.stride(0));
            |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~~~~~~~~~~~
            |                                                              |
            |                                                              int64_t {aka long int}
      [15/68] Building CXX object CMakeFiles/_C.dir/csrc/torch_bindings.cpp.o
      [16/68] Building CUDA object CMakeFiles/_C.dir/csrc/mamba/causal_conv1d/causal_conv1d.cu.o
      [17/68] Building CUDA object CMakeFiles/_C.dir/csrc/mamba/mamba_ssm/selective_scan_fwd.cu.o
      [18/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/aqlm/gemm_kernels.cu.o
      [19/68] Building CUDA object CMakeFiles/_C.dir/csrc/attention/attention_kernels.cu.o
      [20/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/awq/gemm_kernels.cu.o
      [21/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/dense/marlin_cuda_kernel.cu.o
      [22/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/qqq/marlin_qqq_gemm_kernel.cu.o
      [23/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sparse/marlin_24_cuda_kernel.cu.o
      [24/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/awq_marlin_repack.cu.o
      [25/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/gptq_marlin_repack.cu.o
      [26/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gguf/gguf_kernel.cu.o
      [27/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp8/fp8_marlin.cu.o
      [28/68] Building CUDA object CMakeFiles/_C.dir/csrc/custom_all_reduce.cu.o
      [29/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_entry.cu.o
      [30/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o
      FAILED: CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o
      /usr/local/cuda-12.1/bin/nvcc -forward-unknown-to-host-compiler -DCUTLASS_ENABLE_DIRECT_CUDA_DRIVER_CALL=1 -DPy_LIMITED_API=3 -DTORCH_EXTENSION_NAME=_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_C_EXPORTS -I/home/imss27/dev/vllm/csrc -I/tmp/tmpf11xgj9c.build-temp/_deps/cutlass-src/include -isystem /home/imss27/anaconda3/envs/vllm/include/python3.10 -isystem /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/torch/include -isystem /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda-12.1/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_89,code=[sm_89]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8 --threads=1 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o -MF CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o.d -x cu -c /home/imss27/dev/vllm/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu -o CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o
      Killed
      [31/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c3x.cu.o
      /tmp/tmpf11xgj9c.build-temp/_deps/cutlass-src/include/cutlass/device_kernel.h: In function ‘void cutlass::device_kernel(typename Operator::Params) [with Operator = _GLOBAL__N__c31ef43b_16_scaled_mm_c3x_cu_22d95651::cutlass_3x_gemm<signed char, cutlass::bfloat16_t, _GLOBAL__N__c31ef43b_16_scaled_mm_c3x_cu_22d95651::ScaledEpilogueBias, cute::tuple<cute::C<64>, cute::C<64>, cute::C<256> >, cute::tuple<cute::C<1>, cute::C<8>, cute::C<1> >, cutlass::gemm::KernelTmaWarpSpecialized, cutlass::epilogue::TmaWarpSpecialized>::GemmKernel]’:
      /tmp/tmpf11xgj9c.build-temp/_deps/cutlass-src/include/cutlass/device_kernel.h:104:1: note: the ABI for passing parameters with 64-byte alignment has changed in GCC 4.6
        104 | void device_kernel(CUTLASS_GRID_CONSTANT typename Operator::Params const params)
            | ^~~~~~~~~~~~~
      [32/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/gptq_marlin.cu.o
      [33/68] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_ops.cu.o
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 138, in run
          self._create_wheel_file(bdist_wheel)
        File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 341, in _create_wheel_file
          files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
        File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 264, in _run_build_commands
          self._run_build_subcommands()
        File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 291, in _run_build_subcommands
          self.run_command(name)
        File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 950, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 98, in run
          _build_ext.run(self)
        File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
          self.build_extensions()
        File "<string>", line 241, in build_extensions
        File "/home/imss27/anaconda3/envs/vllm/lib/python3.10/subprocess.py", line 369, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['cmake', '--build', '.', '-j=4', '--target=_core_C', '--target=_moe_C', '--target=_C']' returned non-zero exit status 1.
      /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py:973: _DebuggingTips: Problem in editable installation.
youkaichao commented 1 day ago
  fatal: not a git repository (or any of the parent directories): .git
  -- CUTLASS Revision: Unable to detect, Git returned code 128.

looks like some cutlass error?

cc @tlrmchlsmth

youkaichao commented 1 day ago
  -- CUTLASS 3.5.1

the user actually already has cutlass. maybe this caused some conflict?

Imss27 commented 1 day ago

Thanks for providing some quick insignts here. Added some question, could it be due to these lines here in the CMakeLists.txt file?

set(VLLM_EXT_SRC
  "csrc/cache_kernels.cu"
  "csrc/attention/attention_kernels.cu"
  "csrc/pos_encoding_kernels.cu"
  "csrc/activation_kernels.cu"
  "csrc/layernorm_kernels.cu"
  "csrc/quantization/gptq/q_gemm.cu"
  "csrc/quantization/compressed_tensors/int8_quant_kernels.cu"
  "csrc/quantization/fp8/common.cu"
  "csrc/cuda_utils_kernels.cu"
  "csrc/moe_align_block_size_kernels.cu"
  "csrc/prepare_inputs/advance_step.cu"
  "csrc/torch_bindings.cpp")

if(VLLM_GPU_LANG STREQUAL "CUDA")
  include(FetchContent)
  SET(CUTLASS_ENABLE_HEADERS_ONLY ON CACHE BOOL "Enable only the header library")
  FetchContent_Declare(
        cutlass
        GIT_REPOSITORY https://github.com/nvidia/cutlass.git
        GIT_TAG v3.5.1
        GIT_PROGRESS TRUE

        # Speed up CUTLASS download by retrieving only the specified GIT_TAG instead of the history.
        # Important: If GIT_SHALLOW is enabled then GIT_TAG works only with branch names and tags.
        # So if the GIT_TAG above is updated to a commit hash, GIT_SHALLOW must be set to FALSE
        GIT_SHALLOW TRUE
  )
  FetchContent_MakeAvailable(cutlass)
tlrmchlsmth commented 1 day ago

I haven’t seen this before, but looks like something might be going wrong with FetchContent. @Imss27 is your machine connected to the internet? it needs to clone cutlass during the build process. Also try commenting out the GIT_SHALLOW call just in case it is causing problems.

@youkaichao I don’t think cutlass already being there is the thing that’s causing this issue

Imss27 commented 1 day ago

@tlrmchlsmth Thank you. Yes my computer is connecting to the internet. I tried to ping and curl as following. Could this be a possible reason? 1. $ ping https://github.com/nvidia/cutlass.git Returns ping: https://github.com/nvidia/cutlass.git: Name or service not known 2. $ curl https://github.com/nvidia/cutlass.git Returns

<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx</center>
</body>
</html>
zifeitong commented 1 day ago

Can you try and rule out compiler out-of-memory? Either check dmesg for logs or set MAX_JOBS to smaller number.

tlrmchlsmth commented 1 day ago

@tlrmchlsmth Thank you. Yes my computer is connecting to the internet. I tried to ping and curl as following. Could this be a possible reason? 1. $ ping https://github.com/nvidia/cutlass.git Returns ping: https://github.com/nvidia/cutlass.git: Name or service not known 2. $ curl https://github.com/nvidia/cutlass.git Returns

<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx</center>
</body>
</html>

It's a git repo (https://github.com/NVIDIA/cutlass) . You should be able to git clone https://github.com/NVIDIA/cutlass, I don't think it would make sense to ping or curl it

tlrmchlsmth commented 1 day ago

Still not sure what's going on here though. Does deleting the following help? (We've run into trouble with it previously, hence the long comment)

        # Speed up CUTLASS download by retrieving only the specified GIT_TAG instead of the history.
        # Important: If GIT_SHALLOW is enabled then GIT_TAG works only with branch names and tags.
        # So if the GIT_TAG above is updated to a commit hash, GIT_SHALLOW must be set to FALSE
        GIT_SHALLOW TRUE
youkaichao commented 1 day ago

Python platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35

might be caused by WSL?

Imss27 commented 1 day ago

Thank you all for the great suggestions and insights! @youkaichao @tlrmchlsmth @zifeitong

Previously I tried to set MAX_JOBS to small values like 4, 6 as mentioned in vLLM installation docs. But it produced identical errors.

For future reference, using WSL, if encountering similar issues, try conservative approaches like

$ export MAX_JOBS=1
$ pip install -e .

(Note this will take an extremely long time to build)

This solved the build issue that I encountered even still with CUTLASS git error:

fatal: not a git repository (or any of the parent directories): .git
-- CUTLASS Revision: Unable to detect, Git returned code 128.