Open timefliesfang opened 3 months ago
Not sure why that's happening. It shouldn't be trying to compile for sm_52. Do you by any chance have an older NVIDIA GPU in the same system also?
I have a similar error I am building from off of the current TensorRT-LLM docker, ubuntu with CUDA_ARCHS="89-real". I am using a 4070 TI, I am currently trying to debug. hfma2 keeps showing up as not found.
EDIT: also nvcc version
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Wed_Nov_22_10:17:15_PST_2023 Cuda compilation tools, release 12.3, V12.3.107 Build cuda_12.3.r12.3/compiler.33567101_0
gxx version
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
2ND edit: droppped the tensorrt current build and went back to nvcr.io/nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04
When it builds for that TensorRT-LLM docker it does something strange, changing build order seems to work, but running tests now
I ran into the same error: identifier "__hfma2" is undefined
when trying to install exllamav2 (pip install .
). @timefliesfang I'm not sure how you removed the lower arch versions, but I was able to install it by setting TORCH_CUDA_ARCH_LIST
.
Specifically, I installed with the command:
TORCH_CUDA_ARCH_LIST="8.0 8.6 9.0" pip install -e .
I'm installing inside Nvidia's NGC docker 23.12
@turboderp I'm guessing those lower archs got added to the generated compile code because the torch version I'm using was compiled for many different archs. You can check for you're pytorch version with torch.cuda.get_arch_list()
>>> torch.cuda.get_arch_list()
['sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'sm_90']
Although, I don't know anything about how ninja works so I'm don't know for certain
When I tried to run the test_inference.py example, I met this error showing that failed to find all half-precision CUDA APIs. But my CUDA version is new (12.3) and I am using V100 GPU.
I checked the generated compile options of ninja, for example:
[15/20] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output h_gemm.cuda.o.d -DTORCH_EXTENSION_NAME=exllamav2_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/group/ossdphi_algo_scratch_09/chaomfan/exllamav2/exllamav2/exllamav2_ext -isystem /group/ossdphi_algo_scratch_09/chaomfan/exllamav2/exllma_cuda/lib/python3.10/site-packages/torch/include -isystem /group/ossdphi_algo_scratch_09/chaomfan/exllamav2/exllma_cuda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /group/ossdphi_algo_scratch_09/chaomfan/exllamav2/exllma_cuda/lib/python3.10/site-packages/torch/include/TH -isystem /group/ossdphi_algo_scratch_09/chaomfan/exllamav2/exllma_cuda/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_87,code=sm_87 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -lineinfo -O3 -std=c++17 -c /group/ossdphi_algo_scratch_09/chaomfan/exllamav2/exllamav2/exllamav2_ext/cuda/h_gemm.cu -o h_gemm.cuda.o FAILED: h_gemm.cuda.o
there are some low version CUDA compile options invovled (e.g. -gencode=arch=compute_52,code=sm_52), when I removed these, this single file can be compiled successfully.
I wonder how to solve this issue. Thank you.