RuntimeError: CUDA error: invalid device function

Hello, I am trying to run glow-tts in Colab (GPU: Tesla K80)

I changed the CUDA version by this command

!rm /usr/local/cuda
!ln -s /usr/local/cuda-10.0 /usr/local/cuda

# check if installed successfully
!nvcc --version

# Change CUDA version to 10.1
!rm /usr/local/cuda
!ln -s /usr/local/cuda-10.0 /usr/local/cuda

# check if installed successfully
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

and install torch 1.2.0 by these commands

!pip install torch==1.2.0 torchvision==0.4.0 -f https://download.pytorch.org/whl/cu100/torch_stable.html

However, I get the error:

INFO:glow-tts:{'train': {'use_cuda': True, 'log_interval': 20, 'seed': 1234, 'epochs': 10000, 'learning_rate': 1.0, 'betas': [0.9, 0.98], 'eps': 1e-09, 'warmup_steps': 4000, 'scheduler': 'noam', 'batch_size': 32, 'ddi': True, 'fp16_run': True}, 'data': {'load_mel_from_disk': False, 'training_files': 'filelists/ljs_audio_text_train_filelist.txt', 'validation_files': 'filelists/ljs_audio_text_val_filelist.txt', 'text_cleaners': ['korean_cleaners'], 'max_wav_value': 32768.0, 'sampling_rate': 22050, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'n_mel_channels': 80, 'mel_fmin': 80.0, 'mel_fmax': 7600.0, 'add_noise': True, 'add_space': False, 'cmudict_path': 'data/cmu_dictionary'}, 'model': {'hidden_channels': 192, 'filter_channels': 768, 'filter_channels_dp': 256, 'kernel_size': 3, 'p_dropout': 0.1, 'n_blocks_dec': 12, 'n_layers_enc': 6, 'n_heads': 2, 'p_dropout_dec': 0.05, 'dilation_rate': 1, 'kernel_size_dec': 5, 'n_block_layers': 4, 'n_sqz': 2, 'prenet': True, 'mean_only': True, 'hidden_channels_enc': 192, 'hidden_channels_dec': 192, 'window_size': 4}, 'model_dir': '/content/drive/My Drive/Colab Notebooks/data/glow-tts'}
/usr/local/lib/python3.7/dist-packages/scipy/io/wavfile.py:273: WavFileWarning: Chunk (non-data) not understood, skipping it.
  WavFileWarning)
INFO:glow-tts:Saving model and optimizer state at iteration 0 to /content/drive/My Drive/Colab Notebooks/data/glow-tts/ddi_G.pth
INFO:glow-tts:{'train': {'use_cuda': True, 'log_interval': 20, 'seed': 1234, 'epochs': 10000, 'learning_rate': 1.0, 'betas': [0.9, 0.98], 'eps': 1e-09, 'warmup_steps': 4000, 'scheduler': 'noam', 'batch_size': 32, 'ddi': True, 'fp16_run': True}, 'data': {'load_mel_from_disk': False, 'training_files': 'filelists/ljs_audio_text_train_filelist.txt', 'validation_files': 'filelists/ljs_audio_text_val_filelist.txt', 'text_cleaners': ['korean_cleaners'], 'max_wav_value': 32768.0, 'sampling_rate': 22050, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'n_mel_channels': 80, 'mel_fmin': 80.0, 'mel_fmax': 7600.0, 'add_noise': True, 'add_space': False, 'cmudict_path': 'data/cmu_dictionary'}, 'model': {'hidden_channels': 192, 'filter_channels': 768, 'filter_channels_dp': 256, 'kernel_size': 3, 'p_dropout': 0.1, 'n_blocks_dec': 12, 'n_layers_enc': 6, 'n_heads': 2, 'p_dropout_dec': 0.05, 'dilation_rate': 1, 'kernel_size_dec': 5, 'n_block_layers': 4, 'n_sqz': 2, 'prenet': True, 'mean_only': True, 'hidden_channels_enc': 192, 'hidden_channels_dec': 192, 'window_size': 4}, 'model_dir': '/content/drive/My Drive/Colab Notebooks/data/glow-tts'}
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
INFO:glow-tts:Loaded checkpoint '/content/drive/My Drive/Colab Notebooks/data/glow-tts/ddi_G.pth' (iteration 0)
/usr/local/lib/python3.7/dist-packages/scipy/io/wavfile.py:273: WavFileWarning: Chunk (non-data) not understood, skipping it.
  WavFileWarning)
Traceback (most recent call last):
  File "train.py", line 190, in <module>
    main()
  File "train.py", line 34, in main
    mp.spawn(train_and_eval, nprocs=n_gpus, args=(n_gpus, hps,))
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 171, in spawn
    while not spawn_context.join():
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/content/glow-tts/train.py", line 88, in train_and_eval
    train(rank, epoch, hps, generator, optimizer_g, train_loader, logger, writer)
  File "/content/glow-tts/train.py", line 116, in train
    scaled_loss.backward()
  File "/usr/local/lib/python3.7/dist-packages/torch/tensor.py", line 118, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA error: invalid device function

I believe RuntimeError: CUDA error is the error by the version. How should I fix this?

jaywalnut310 / glow-tts

RuntimeError: CUDA error: invalid device function #64