CUFFT_INTERNAL_ERROR on RTX4090

shine-xia commented 7 months ago

Requirements.txt in meloTTS：torch<2.0 but the codes below can only be valid in torch version higher than 1.13.0, So my choices are torch 13.0/13.1. All torch 13.0/13.1 packages are built against cuda 11.6/11.7.

torch.backends.cudnn.benchmark = True
torch.backends.cuda.sdp_kernel("flash")
torch.backends.cuda.enable_flash_sdp(True)
# torch.backends.cuda.enable_mem_efficient_sdp(
#     True
# )  # Not available if torch version is lower than 2.0
torch.backends.cuda.enable_math_sdp(True)

Now that I have a RTX4090, I can't train meloTTS on it with torch 1.13.1 for a cuda-bug which is fixed in cuda 11.8: https://github.com/pytorch/pytorch/issues/88038

So I hope you the developers of MeloTTS could take the torch version up to 2.0 or higher.

shine-xia commented 7 months ago

But it turns out to run successfully with some warnings on torch 2.0.1...

MissingTwins commented 7 months ago

1. open `MeloTTS\requirements.txt'
    Change `torch<2.0` to `torch`
    remove extra `mecab-python3==1.0.5`
    Change only remained `mecab-python3==1.0.5` to `mecab-python3`
    save changes
2. go to melo virtual Env  
    run `cd MeloTTS`   
    run `pip install -e .`  
    run `python -m unidic download`

Should work with CUDA 12.2, but not CUDA 12.3

$ nvidia-smi 
Wed Apr 10 23:45:53 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07             Driver Version: 535.161.07   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     On  | 00000000:0A:00.0 Off |                  N/A |
|  0%   26C    P8              19W / 275W |      1MiB / 11264MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

dennis-wr commented 7 months ago

Here is my method (Ubuntu 2204) :

1) Uninstall CUDA completely.

$ sudo /usr/local/cuda-11.7/bin/cuda-uninstaller
$ sudo /usr/bin/nvidia-uninstall

$ sudo apt-get --purge remove "*cublas*" "cuda*" "nsight*"
$ sudo apt-get --purge remove "*nvidia*"
$ sudo apt-get autoremove
$ sudo apt-get autoclean
$ sudo rm -rf /usr/local/cuda*

$ sudo dpkg -r cuda
$ sudo dpkg -r $(dpkg -l | grep '^ii  cudnn' | awk '{print $2}')

$ sudo apt-get update

2) Install Nvidia Drivers.

$ ubuntu-drivers devices
$ sudo apt install nvidia-driver-525 (After checking the desired version using the above command)

$ sudo ubuntu-drivers autoinstall (OR use this command)

$ sudo reboot

3) Install CUDA Toolkit 11.8

$ wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
$ sudo sh cuda_11.8.0_520.61.05_linux.run (Uncheck Driver)

$ vi ~/.bashrc
export PATH=/usr/local/cuda-11.8/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

$ source ~/.bashrc
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

4) Install cuDDN 8 (Maybe Optional?)

sudo dpkg -i cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2204-8.9.7.29/cudnn-local-8AE81B24-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get install libcudnn8 libcudnn8-dev libcudnn8-samples

5) Install PyTorch for CUDA 11.8

pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118

You may need to modify your code because of warnings or errors.

johnPertoft commented 7 months ago

Same as https://github.com/myshell-ai/MeloTTS/issues/80 for visibility.

myshell-ai / MeloTTS

CUFFT_INTERNAL_ERROR on RTX4090 #96