Closed Imss27 closed 1 day ago
Error Messages:
-- The CXX compiler identification is GNU 11.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Build type: RelWithDebInfo
-- Target device: cuda
-- Found Python: /home/imss27/anaconda3/envs/vllm/bin/python (found version "3.10.14") found components: Interpreter Development.Module Development.SABIModule
-- Found python matching: /home/imss27/anaconda3/envs/vllm/bin/python.
-- Found CUDA: /usr/local/cuda-12.1 (found version "12.1")
-- The CUDA compiler identification is NVIDIA 12.1.66
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-12.1/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda-12.1/include (found version "12.1.66")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Caffe2: CUDA detected: 12.1
-- Caffe2: CUDA nvcc is: /usr/local/cuda-12.1/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda-12.1
-- Caffe2: Header version is: 12.1
-- /usr/local/cuda-12.1/lib64/libnvrtc.so shorthash is d540eb83
-- USE_CUDNN is set to 0. Compiling without cuDNN support
-- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
-- Autodetected CUDA architecture(s): 8.9
-- Added CUDA NVCC flags for: -gencode;arch=compute_89,code=sm_89
CMake Warning at /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:120 (append_torchlib_if_found)
CMakeLists.txt:70 (find_package)
-- Found Torch: /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/torch/lib/libtorch.so
-- Enabling core extension.
-- CUDA supported arches: 7.0;7.5;8.0;8.6;8.9;9.0
-- CUDA target arches: 89-real
-- CMake Version: 3.30.3
-- CUTLASS 3.5.1
-- CUDART: /usr/local/cuda-12.1/lib64/libcudart.so
-- CUDA Driver: /usr/local/cuda-12.1/lib64/stubs/libcuda.so
-- NVRTC: /usr/local/cuda-12.1/lib64/libnvrtc.so
-- Default Install Location: install
-- Found Python3: /home/imss27/anaconda3/envs/vllm/bin/python3.10 (found suitable version "3.10.14", minimum required is "3.5") found components: Interpreter
-- Make cute::tuple be the new standard-layout tuple type
-- CUDA Compilation Architectures: 70;72;75;80;86;87;89;90;90a
-- Enable caching of reference results in conv unit tests
-- Enable rigorous conv problem sizes in conv unit tests
-- Using NVCC flags: --expt-relaxed-constexpr;-DCUTE_USE_PACKED_TUPLE=1;-DCUTLASS_TEST_LEVEL=0;-DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1;-DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1;-DCUTLASS_DEBUG_TRACE_LEVEL=0;-Xcompiler=-Wconversion;-Xcompiler=-fno-strict-aliasing;-lineinfo
fatal: not a git repository (or any of the parent directories): .git
-- CUTLASS Revision: Unable to detect, Git returned code 128.
-- Configuring cublas ...
-- cuBLAS Disabled.
-- Configuring cuBLAS ... done.
-- Machete generation completed successfully.
-- Machete generated sources: /home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4b8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4b8_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u4b8_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8b128.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8b128_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_bf16u8b128_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u4.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u4_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u4_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u4b8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u4b8_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u4b8_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u8_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u8_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u8b128.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u8b128_impl_part0.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_mm_f16u8b128_impl_part1.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_bf16u4.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_bf16u4b8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_bf16u8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_bf16u8b128.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_f16u4.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_f16u4b8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_f16u8.cu;/home/imss27/dev/vllm/csrc/quantization/machete/generated/machete_prepack_f16u8b128.cu
-- Enabling C extension.
-- Enabling moe extension.
-- Configuring done (12.5s)
CMake Warning at /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/cmake/data/share/cmake-3.30/Modules/FindPython/Support.cmake:4255 (add_library):
Cannot generate a safe runtime search path for target _core_C because files
in some directories may conflict with libraries in implicit directories:
runtime library [libnvToolsExt.so.1] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
/usr/local/cuda-12.1/lib64
Some of these libraries may not be found correctly.
Call Stack (most recent call first):
/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/cmake/data/share/cmake-3.30/Modules/FindPython.cmake:691 (__Python_add_library)
cmake/utils.cmake:327 (Python_add_library)
CMakeLists.txt:94 (define_gpu_extension_target)
-- Generating done (0.1s)
-- Build files have been written to: /tmp/tmpf11xgj9c.build-temp
Using MAX_JOBS=4 as the number of jobs.
[1/68] Building CXX object CMakeFiles/_core_C.dir/csrc/core/torch_bindings.cpp.o
[2/68] Linking CXX shared module /tmp/tmpnrpupvkc.build-lib/vllm/_core_C.abi3.so
[3/68] Building CXX object CMakeFiles/_moe_C.dir/csrc/moe/torch_bindings.cpp.o
[4/68] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o
[5/68] Building CUDA object CMakeFiles/_C.dir/csrc/cache_kernels.cu.o
[6/68] Building CUDA object CMakeFiles/_C.dir/csrc/pos_encoding_kernels.cu.o
[7/68] Building CUDA object CMakeFiles/_C.dir/csrc/activation_kernels.cu.o
[8/68] Building CUDA object CMakeFiles/_C.dir/csrc/layernorm_kernels.cu.o
[9/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq/q_gemm.cu.o
[10/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/compressed_tensors/int8_quant_kernels.cu.o
[11/68] Building CUDA object CMakeFiles/_C.dir/csrc/cuda_utils_kernels.cu.o
[12/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp8/common.cu.o
/home/imss27/dev/vllm/csrc/quantization/fp8/common.cu:20:1: warning: ‘host’ attribute directive ignored [-Wattributes]
20 | C10_HOST_DEVICE constexpr auto FP8_E4M3_MAX =
| ^~~~~~~~~~~~
[13/68] Building CUDA object CMakeFiles/_C.dir/csrc/moe_align_block_size_kernels.cu.o
[14/68] Building CUDA object CMakeFiles/_C.dir/csrc/prepare_inputs/advance_step.cu.o
/home/imss27/dev/vllm/csrc/prepare_inputs/advance_step.cu: In function ‘void prepare_inputs::advance_step_flashinfer(int, int, int, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&)’:
/home/imss27/dev/vllm/csrc/prepare_inputs/advance_step.cu:214:8: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
214 | printf(" block_tables.stride(0) = %d\n", block_tables.stride(0));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
| |
| int64_t {aka long int}
[15/68] Building CXX object CMakeFiles/_C.dir/csrc/torch_bindings.cpp.o
[16/68] Building CUDA object CMakeFiles/_C.dir/csrc/mamba/causal_conv1d/causal_conv1d.cu.o
[17/68] Building CUDA object CMakeFiles/_C.dir/csrc/mamba/mamba_ssm/selective_scan_fwd.cu.o
[18/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/aqlm/gemm_kernels.cu.o
[19/68] Building CUDA object CMakeFiles/_C.dir/csrc/attention/attention_kernels.cu.o
[20/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/awq/gemm_kernels.cu.o
[21/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/dense/marlin_cuda_kernel.cu.o
[22/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/qqq/marlin_qqq_gemm_kernel.cu.o
[23/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sparse/marlin_24_cuda_kernel.cu.o
[24/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/awq_marlin_repack.cu.o
[25/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/gptq_marlin_repack.cu.o
[26/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gguf/gguf_kernel.cu.o
[27/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp8/fp8_marlin.cu.o
[28/68] Building CUDA object CMakeFiles/_C.dir/csrc/custom_all_reduce.cu.o
[29/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_entry.cu.o
[30/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o
FAILED: CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o
/usr/local/cuda-12.1/bin/nvcc -forward-unknown-to-host-compiler -DCUTLASS_ENABLE_DIRECT_CUDA_DRIVER_CALL=1 -DPy_LIMITED_API=3 -DTORCH_EXTENSION_NAME=_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_C_EXPORTS -I/home/imss27/dev/vllm/csrc -I/tmp/tmpf11xgj9c.build-temp/_deps/cutlass-src/include -isystem /home/imss27/anaconda3/envs/vllm/include/python3.10 -isystem /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/torch/include -isystem /tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda-12.1/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_89,code=[sm_89]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8 --threads=1 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o -MF CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o.d -x cu -c /home/imss27/dev/vllm/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu -o CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o
Killed
[31/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c3x.cu.o
/tmp/tmpf11xgj9c.build-temp/_deps/cutlass-src/include/cutlass/device_kernel.h: In function ‘void cutlass::device_kernel(typename Operator::Params) [with Operator = _GLOBAL__N__c31ef43b_16_scaled_mm_c3x_cu_22d95651::cutlass_3x_gemm<signed char, cutlass::bfloat16_t, _GLOBAL__N__c31ef43b_16_scaled_mm_c3x_cu_22d95651::ScaledEpilogueBias, cute::tuple<cute::C<64>, cute::C<64>, cute::C<256> >, cute::tuple<cute::C<1>, cute::C<8>, cute::C<1> >, cutlass::gemm::KernelTmaWarpSpecialized, cutlass::epilogue::TmaWarpSpecialized>::GemmKernel]’:
/tmp/tmpf11xgj9c.build-temp/_deps/cutlass-src/include/cutlass/device_kernel.h:104:1: note: the ABI for passing parameters with 64-byte alignment has changed in GCC 4.6
104 | void device_kernel(CUTLASS_GRID_CONSTANT typename Operator::Params const params)
| ^~~~~~~~~~~~~
[32/68] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/gptq_marlin.cu.o
[33/68] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_ops.cu.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 138, in run
self._create_wheel_file(bdist_wheel)
File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 341, in _create_wheel_file
files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 264, in _run_build_commands
self._run_build_subcommands()
File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 291, in _run_build_subcommands
self.run_command(name)
File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 950, in run_command
super().run_command(command)
File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 98, in run
_build_ext.run(self)
File "/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
self.build_extensions()
File "<string>", line 241, in build_extensions
File "/home/imss27/anaconda3/envs/vllm/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '-j=4', '--target=_core_C', '--target=_moe_C', '--target=_C']' returned non-zero exit status 1.
/tmp/pip-build-env-wv88m5l1/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py:973: _DebuggingTips: Problem in editable installation.
fatal: not a git repository (or any of the parent directories): .git -- CUTLASS Revision: Unable to detect, Git returned code 128.
looks like some cutlass error?
cc @tlrmchlsmth
-- CUTLASS 3.5.1
the user actually already has cutlass. maybe this caused some conflict?
Thanks for providing some quick insignts here. Added some question, could it be due to these lines here in the CMakeLists.txt file?
set(VLLM_EXT_SRC
"csrc/cache_kernels.cu"
"csrc/attention/attention_kernels.cu"
"csrc/pos_encoding_kernels.cu"
"csrc/activation_kernels.cu"
"csrc/layernorm_kernels.cu"
"csrc/quantization/gptq/q_gemm.cu"
"csrc/quantization/compressed_tensors/int8_quant_kernels.cu"
"csrc/quantization/fp8/common.cu"
"csrc/cuda_utils_kernels.cu"
"csrc/moe_align_block_size_kernels.cu"
"csrc/prepare_inputs/advance_step.cu"
"csrc/torch_bindings.cpp")
if(VLLM_GPU_LANG STREQUAL "CUDA")
include(FetchContent)
SET(CUTLASS_ENABLE_HEADERS_ONLY ON CACHE BOOL "Enable only the header library")
FetchContent_Declare(
cutlass
GIT_REPOSITORY https://github.com/nvidia/cutlass.git
GIT_TAG v3.5.1
GIT_PROGRESS TRUE
# Speed up CUTLASS download by retrieving only the specified GIT_TAG instead of the history.
# Important: If GIT_SHALLOW is enabled then GIT_TAG works only with branch names and tags.
# So if the GIT_TAG above is updated to a commit hash, GIT_SHALLOW must be set to FALSE
GIT_SHALLOW TRUE
)
FetchContent_MakeAvailable(cutlass)
I haven’t seen this before, but looks like something might be going wrong with FetchContent. @Imss27 is your machine connected to the internet? it needs to clone cutlass during the build process. Also try commenting out the GIT_SHALLOW call just in case it is causing problems.
@youkaichao I don’t think cutlass already being there is the thing that’s causing this issue
@tlrmchlsmth Thank you. Yes my computer is connecting to the internet. I tried to ping and curl as following. Could this be a possible reason?
1.
$ ping https://github.com/nvidia/cutlass.git
Returns
ping: https://github.com/nvidia/cutlass.git: Name or service not known
2.
$ curl https://github.com/nvidia/cutlass.git
Returns
<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx</center>
</body>
</html>
Can you try and rule out compiler out-of-memory? Either check dmesg
for logs or set MAX_JOBS to smaller number.
@tlrmchlsmth Thank you. Yes my computer is connecting to the internet. I tried to ping and curl as following. Could this be a possible reason? 1.
$ ping https://github.com/nvidia/cutlass.git
Returnsping: https://github.com/nvidia/cutlass.git: Name or service not known
2.$ curl https://github.com/nvidia/cutlass.git
Returns<html> <head><title>301 Moved Permanently</title></head> <body> <center><h1>301 Moved Permanently</h1></center> <hr><center>nginx</center> </body> </html>
It's a git repo (https://github.com/NVIDIA/cutlass) . You should be able to
git clone https://github.com/NVIDIA/cutlass
, I don't think it would make sense to ping or curl it
Still not sure what's going on here though. Does deleting the following help? (We've run into trouble with it previously, hence the long comment)
# Speed up CUTLASS download by retrieving only the specified GIT_TAG instead of the history.
# Important: If GIT_SHALLOW is enabled then GIT_TAG works only with branch names and tags.
# So if the GIT_TAG above is updated to a commit hash, GIT_SHALLOW must be set to FALSE
GIT_SHALLOW TRUE
Python platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
might be caused by WSL?
Thank you all for the great suggestions and insights! @youkaichao @tlrmchlsmth @zifeitong
Previously I tried to set MAX_JOBS
to small values like 4, 6
as mentioned in vLLM installation docs. But it produced identical errors.
For future reference, using WSL, if encountering similar issues, try conservative approaches like
$ export MAX_JOBS=1
$ pip install -e .
(Note this will take an extremely long time to build)
This solved the build issue that I encountered even still with CUTLASS git error:
fatal: not a git repository (or any of the parent directories): .git
-- CUTLASS Revision: Unable to detect, Git returned code 128.
Your current environment
How you are installing vllm
Before submitting a new issue...