quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
https://quic.github.io/aimet-pages/index.html
Other
2.08k stars 373 forks source link

CUDA ARCH 9.0 unsupported? #3157

Open CHNtentes opened 1 month ago

CHNtentes commented 1 month ago

(qnn) root@gpu_h20:/nvme/shanghai-2/ltg/aimet/build# cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DENABLE_CUDA=ON -DENABLE_TORCH=ON -DENABLE_TENSORFLOW=ON -DENABLE_ONNX=ON -- The C compiler identification is GNU 9.4.0 -- The CXX compiler identification is GNU 9.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Set SW_VERSION = 1.32.0 from /nvme/shanghai-2/ltg/aimet/packaging/version.txt -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE Compiling with CUDA enabled -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "11.8.89") -- Initial CMAKE_CUDA_ARCHITECTURES = 52;60;61;70;72 -- The CUDA compiler identification is NVIDIA 11.8.89 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDA toolkit version 11.8.89, using cu118 -- Found Python3: /root/miniconda3/envs/qnn/bin/python3.10 (found version "3.10.0") found components: Interpreter Development Development.Module Development.Embed Found python: TRUE, at /root/miniconda3/envs/qnn/lib/libpython3.10.so -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.1") -- Checking for one of the modules 'lapacke' -- Checking for one of the modules 'opencv' -- Found ONNX version: 1.14.1 -- AIMET - ENABLE_TORCH: ON -- Found Torch version: 2.1.2+cu118 -- Updated CMAKE_CUDA_ARCHITECTURES to 50;60;70;75;80;86;37;90 -- Removed unsupported archs (90), Now CMAKE_CUDA_ARCHITECTURES = 50;60;70;75;80;86;37 -- Updated TORCH_CUDA_ARCH_LIST to 5.0 6.0 7.0 7.5 8.0 8.6 3.7

CHNtentes commented 1 month ago

In CMakeLists.txt:

We remove certain architectures that are not supported

set(UNSUPPORTED_CUDA_ARCHITECTURES_TORCH 90) list(REMOVE_ITEM CMAKE_CUDA_ARCHITECTURES ${UNSUPPORTED_CUDA_ARCHITECTURES_TORCH}) message(STATUS " Removed unsupported archs (${UNSUPPORTED_CUDA_ARCHITECTURES_TORCH}), \ Now CMAKE_CUDA_ARCHITECTURES = ${CMAKE_CUDA_ARCHITECTURES} ")

CHNtentes commented 1 month ago

@quic-akhobare @quic-bharathr @quic-mangal

CHNtentes commented 1 month ago

:(

quic-akhobare commented 1 month ago

Hi @CHNtentes - we are working to add this back. Some of the executors failed with arch 90 and was causing build instability. PR should be up soon.

CHNtentes commented 1 month ago

Hi @CHNtentes - we are working to add this back. Some of the executors failed with arch 90 and was causing build instability. PR should be up soon.

Thanks for your reply. Do I need to manually clone & build after PR is up, or you will release a 1.33.1 version?

quic-jpolizzi commented 1 month ago

@CHNtentes We will go ahead and provide a release once we get this completed, thanks!

CHNtentes commented 1 month ago

Hi, I saw that you merged a PR about this issue. Could you share when you will release a fix version? Our H20 server has not been in use for long time.