mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
MIT License
2.51k stars 198 forks source link

Tesla T4 Feature '.m16n8k16' requires .target sm_80 or higher #93

Open piotrecode opened 1 year ago

piotrecode commented 1 year ago

Can i run AWQ on Testla T4?

` /app/repositories/llm-awq/awq/kernels/csrc/quantization/gemm_cuda_gen.cu(22): warning #177-D: function "__pack_half2" was declared but never referenced

ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 928; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 932; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 936; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 940; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 944; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 948; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 952; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 956; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 1000; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 1004; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 1008; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 1012; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 1016; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 1020; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 1024; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 1028; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 1854; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 1858; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 1862; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 1866; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 1894; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 1898; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 1902; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_00000033_00000000-6_gemm_cuda_gen.ptx, line 1906; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas fatal : Ptx assembly aborted due to errors [4/7] /usr/local/cuda-11/bin/nvcc -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda-11/include -I/usr/include/python3.10 -c -c /app/repositories/llm-awq/awq/kernels/csrc/position_embedding/pos_encoding_kernels.cu -o /app/repositories/llm-awq/awq/kernels/build/temp.linux-x86_64-3.10/csrc/position_embedding/pos_encoding_kernels.o -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_BFLOAT16_OPERATORS -UCUDA_NO_BFLOAT16_CONVERSIONS -UCUDA_NO_BFLOAT162_OPERATORS -UCUDA_NO_BFLOAT162_CONVERSIONS --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -ccbin gcc /usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero detected during: instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator==(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=size_t, one_sided=false, =0]" (61): here instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator!=(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=size_t, one_sided=false, =0]" /usr/local/lib/python3.10/dist-packages/torch/include/c10/core/TensorImpl.h(77): here

/usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero detected during: instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator==(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=std::size_t, one_sided=true, =0]" (61): here instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator!=(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=std::size_t, one_sided=true, =0]" /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/qualified_name.h(73): here

[5/7] /usr/local/cuda-11/bin/nvcc -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda-11/include -I/usr/include/python3.10 -c -c /app/repositories/llm-awq/awq/kernels/csrc/layernorm/layernorm.cu -o /app/repositories/llm-awq/awq/kernels/build/temp.linux-x86_64-3.10/csrc/layernorm/layernorm.o -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_BFLOAT16_OPERATORS -UCUDA_NO_BFLOAT16_CONVERSIONS -UCUDA_NO_BFLOAT162_OPERATORS -UCUDA_NO_BFLOAT162_CONVERSIONS --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -ccbin gcc /usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero detected during: instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator==(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=size_t, one_sided=false, =0]" (61): here instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator!=(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=size_t, one_sided=false, =0]" /usr/local/lib/python3.10/dist-packages/torch/include/c10/core/TensorImpl.h(77): here

/usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero detected during: instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator==(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=std::size_t, one_sided=true, =0]" (61): here instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator!=(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=std::size_t, one_sided=true, =0]" /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/qualified_name.h(73): here

[6/7] /usr/local/cuda-11/bin/nvcc -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda-11/include -I/usr/include/python3.10 -c -c /app/repositories/llm-awq/awq/kernels/csrc/quantization/gemv_cuda.cu -o /app/repositories/llm-awq/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization/gemv_cuda.o -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_BFLOAT16_OPERATORS -UCUDA_NO_BFLOAT16_CONVERSIONS -UCUDA_NO_BFLOAT162_OPERATORS -UCUDA_NO_BFLOAT162_CONVERSIONS --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -ccbin gcc /app/repositories/llm-awq/awq/kernels/csrc/quantization/gemv_cuda.cu(224): warning #177-D: variable "blockDim_z" was declared but never referenced

/usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero detected during: instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator==(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=size_t, one_sided=false, =0]" (61): here instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator!=(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=size_t, one_sided=false, =0]" /usr/local/lib/python3.10/dist-packages/torch/include/c10/core/TensorImpl.h(77): here

/usr/local/lib/python3.10/dist-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero detected during: instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator==(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=std::size_t, one_sided=true, =0]" (61): here instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator!=(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=std::size_t, one_sided=true, =0]" /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/qualified_name.h(73): here `

piotrecode commented 1 year ago

Fixed

hongyeonyu commented 11 months ago

Fixed

I am also facing the same issue. Could you please tell me how you fixed it?

fxmarty commented 11 months ago

Feature '.m16n8k16' requires .target sm_80 or higher

IMO AWQ can't run on T4 GPUs. On A100 you need TORCH_CUDA_ARCH_LIST="8.0" python setup.py install

pribadihcr commented 7 months ago

Hi @fxmarty this is the example workaround for older gpu https://github.com/vllm-project/vllm/pull/1252 How to adapt to this repo? thanks

starsy commented 6 months ago

Feature '.m16n8k16' requires .target sm_80 or higher

IMO AWQ can't run on T4 GPUs. On A100 you need TORCH_CUDA_ARCH_LIST="8.0" python setup.py install

this trick works for H100.

ao-zz commented 5 months ago

@hongyeonyu @pribadihcr

cd awq/kernels
export TORCH_CUDA_ARCH_LIST="x.y"  # add this
python setup.py install

"x.y" should be 8.0 or higher, which depends on your GPU (see here) For personal users, RTX 3060 or newer is required. For data center users, A2 or newer is required.

Ali-Flt commented 3 months ago

This should be in the README