sp-uhh / sgmse

Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
MIT License
448 stars 67 forks source link

unsupported gpu architecture 'compute_89' #51

Closed a897456 closed 1 month ago

a897456 commented 1 month ago

E:\ProgramData\anaconda3\envs\py39sgmse\python.exe E:\000\sgmse-main\train.py --base_dir E:/000/sgmse-main/Data_Librispeech Set TORCH_CUDA_ARCH_LIST to: 8.9 Traceback (most recent call last): File "E:\ProgramData\anaconda3\envs\py39sgmse\lib\site-packages\torch\utils\cpp_extension.py", line 2107, in _run_ninja_build subprocess.run( File "E:\ProgramData\anaconda3\envs\py39sgmse\lib\subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "E:\000\sgmse-main\train.py", line 14, in from sgmse.backbones.shared import BackboneRegistry File "E:\000\sgmse-main\sgmse\backbones__init.py", line 2, in from .ncsnpp import NCSNpp File "E:\000\sgmse-main\sgmse\backbones\ncsnpp.py", line 18, in from .ncsnpp_utils import layers, layerspp, normalization File "E:\000\sgmse-main\sgmse\backbones\ncsnpp_utils\layerspp.py", line 20, in from . import up_or_down_sampling File "E:\000\sgmse-main\sgmse\backbones\ncsnpp_utils\up_or_down_sampling.py", line 10, in from .op import upfirdn2d File "E:\000\sgmse-main\sgmse\backbones\ncsnpp_utils\op__init__.py", line 1, in from .upfirdn2d import upfirdn2d File "E:\000\sgmse-main\sgmse\backbones\ncsnpp_utils\op\upfirdn2d.py", line 12, in upfirdn2d_op = load( File "E:\ProgramData\anaconda3\envs\py39sgmse\lib\site-packages\torch\utils\cpp_extension.py", line 1309, in load return _jit_compile( File "E:\ProgramData\anaconda3\envs\py39sgmse\lib\site-packages\torch\utils\cpp_extension.py", line 1719, in _jit_compile _write_ninja_file_and_build_library( File "E:\ProgramData\anaconda3\envs\py39sgmse\lib\site-packages\torch\utils\cpp_extension.py", line 1832, in _write_ninja_file_and_build_library _run_ninja_build( File "E:\ProgramData\anaconda3\envs\py39sgmse\lib\site-packages\torch\utils\cpp_extension.py", line 2123, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'upfirdn2d': [1/2] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output upfirdn2d_kernel.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=upfirdn2d -DTORCH_API_INCLUDE_EXTENSION_H -IE:\ProgramData\anaconda3\envs\py39sgmse\lib\site-packages\torch\include -IE:\ProgramData\anaconda3\envs\py39sgmse\lib\site-packages\torch\include\torch\csrc\api\include -IE:\ProgramData\anaconda3\envs\py39sgmse\lib\site-packages\torch\include\TH -IE:\ProgramData\anaconda3\envs\py39sgmse\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IE:\ProgramData\anaconda3\envs\py39sgmse\Include -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_89,code=sm_89 -std=c++17 -c E:\000\sgmse-main\sgmse\backbones\ncsnpp_utils\op\upfirdn2d_kernel.cu -o upfirdn2d_kernel.cuda.o FAILED: upfirdn2d_kernel.cuda.o C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output upfirdn2d_kernel.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=upfirdn2d -DTORCH_API_INCLUDE_EXTENSION_H -IE:\ProgramData\anaconda3\envs\py39sgmse\lib\site-packages\torch\include -IE:\ProgramData\anaconda3\envs\py39sgmse\lib\site-packages\torch\include\torch\csrc\api\include -IE:\ProgramData\anaconda3\envs\py39sgmse\lib\site-packages\torch\include\TH -IE:\ProgramData\anaconda3\envs\py39sgmse\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IE:\ProgramData\anaconda3\envs\py39sgmse\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=sm_89 -std=c++17 -c E:\000\sgmse-main\sgmse\backbones\ncsnpp_utils\op\upfirdn2d_kernel.cu -o upfirdn2d_kernel.cuda.o

ninja: build stopped: subcommand failed.

julius-richter commented 1 month ago

Can you please try what ChatGPT suggests:

The error you're encountering seems to be related to building a CUDA extension, specifically the upfirdn2d kernel, using the Ninja build system. The error message indicates that the process failed when running the nvcc compiler. Here are a few steps you can take to try and resolve this issue:

  1. CUDA Version Compatibility: Ensure that the version of CUDA you are using (v12.1) is compatible with the version of PyTorch you have installed and the compute capability of your GPU.

  2. Check CUDA and NVIDIA Driver:

    • Make sure your NVIDIA driver is up to date.
    • Verify that your CUDA toolkit is properly installed and the environment variables (e.g., CUDA_HOME, PATH, and LD_LIBRARY_PATH) are correctly set.
  3. Reinstall PyTorch with CUDA support: If there are compatibility issues, consider reinstalling PyTorch with the appropriate CUDA version. You can find the right command for your setup on the https://pytorch.org/get-started/locally/">PyTorch website.

  4. Install Ninja: Ensure that Ninja build system is installed and accessible in your PATH. You can install it via Conda:

sh conda install -c conda-forge ninja

  1. Compile Manually: Sometimes building the extension manually can provide more diagnostic information. Navigate to the directory containing the upfirdn2d_kernel.cu file and run the nvcc command manually to see if there are more detailed errors.

  2. Modify ARCH_LIST: ARCH_LIST is set to 8.9 which corresponds to NVIDIA Ada GPUs like the RTX 40 series. If you're using a different GPU, ensure TORCH_CUDA_ARCH_LIST matches your GPU architecture. You can find the appropriate architecture version https://developer.nvidia.com/cuda-gpus">here.

  3. Simplified Compilation Command: Simplify your nvcc command to isolate issues. For example:

sh nvcc -arch=sm_89 --expt-relaxed-constexpr -std=c++17 -c -o upfirdn2d_kernel.cuda.o upfirdn2d_kernel.cu

  1. Permissions: Ensure you have the necessary permissions to read/write in the directories you are working in.

Here's a basic checklist to follow:

sh conda install pytorch torchvision torchaudio cudatoolkit=12.1 -c pytorch

python import torch print(torch.cuda.is_available())

sh set TORCH_CUDA_ARCH_LIST=8.9

sh conda install ninja

Here's an example of checking the CUDA versions and toolchain: sh nvcc --version

Verify it matches the expected version and ensure your GPU driver is recent.

Please apply these steps and let me know if the issue persists!