mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
MIT License
2.55k stars 207 forks source link

Feature 'ldmatrix' requires target sm_75 or higher when building awq_inference_engine on Tesla V100 #223

Closed ShobhaRajanna closed 1 month ago

ShobhaRajanna commented 1 month ago

I am trying to build the awq_inference_engine on a system with Tesla V100 GPUs, but I'm encountering errors related to CUDA architecture compatibility. Specifically, the build fails with errors such as Feature 'ldmatrix' requires .target sm_75 or higher when compiling gemm_cuda_gen.cu.

Steps to reproduce:

Clone the repository. Set up the environment (CUDA 11.7, Tesla V100 GPUs). Run python setup.py install. Compilation fails with the errors listed below. ptxas /tmp/tmpxft_00022ba1_00000000-6_gemm_cuda_gen.ptx, line 596; error : Feature 'ldmatrix' requires .target sm_75 or higher ptxas /tmp/tmpxft_00022ba1_00000000-6_gemm_cuda_gen.ptx, line 604; error : Modifier '.m8n8' requires .target sm_75 or higher

Environment:

GPU: Tesla V100-SXM2 CUDA Version: 11.7 PyTorch Version: 2.x Operating System: Ubuntu 18.04 I tried modifying the setup.py to include the -gencode arch=compute_70,code=sm_70 flags and conditionally compiling the CUDA code, but the issue persists. Is there a recommended way to disable features requiring sm_75 for older GPUs like the V100?