RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED while running demo_talkingface

Aithu-Snehith commented 4 years ago

I encountered the below error while running the following cell !cd Audio/code/; python train_19news_1.py 31 0

The error is while running on the sample video given itself. ('Data/31.mp4').

The full Traceback is as follow:

python atcnet.py --pose 1 --relativeframe 0 --dataset news --newsname 19_news/31 --start 0 --model_dir ../model/atcnet_pose0_con3/31/ --continue_train 1 --lr 0.0001 --less_constrain 1 --smooth_loss 1 --smooth_loss2 1 --model_name ../model/atcnet_lstm_general.pth --sample_dir ../sample/atcnet_pose0_con3/31 --device_ids 0 --max_epochs 100
device 0
---------- Networks initialized -------------
[Network] Total number of parameters : 29.431 M
-----------------------------------------------
Traceback (most recent call last):
  File "atcnet.py", line 328, in <module>
    main(config)
  File "atcnet.py", line 305, in main
    t = trainer.Trainer(config)
  File "/content/Audio-driven-TalkingFace-HeadPose/Audio/code/atcnet.py", line 81, in __init__
    self.generator     = self.generator.cuda()
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 265, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 193, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py", line 127, in _apply
    self.flatten_parameters()
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py", line 123, in flatten_parameters
    self.batch_first, bool(self.bidirectional))
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

How can I overcome this.

anantyash9 commented 3 years ago

I am getting the same error

python atcnet.py --pose 1 --relativeframe 0 --dataset news --newsname 19_news/31 --start 0 --model_dir ../model/atcnet_pose0_con3/31/ --continue_train 1 --lr 0.0001 --less_constrain 1 --smooth_loss 1 --smooth_loss2 1 --model_name ../model/atcnet_lstm_general.pth --sample_dir ../sample/atcnet_pose0_con3/31 --device_ids 0 --max_epochs 100
device 0
---------- Networks initialized -------------
[Network] Total number of parameters : 29.431 M
-----------------------------------------------
Traceback (most recent call last):
  File "atcnet.py", line 328, in <module>
    main(config)
  File "atcnet.py", line 305, in main
    t = trainer.Trainer(config)
  File "/content/Audio-driven-TalkingFace-HeadPose/Audio/code/atcnet.py", line 81, in __init__
    self.generator     = self.generator.cuda()
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 265, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 193, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py", line 127, in _apply
    self.flatten_parameters()
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py", line 123, in flatten_parameters
    self.batch_first, bool(self.bidirectional))
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

anantyash9 commented 3 years ago

Changing the torch version fixed this

!pip uninstall torch
!pip install https://download.pytorch.org/whl/cu100/torch-1.0.1.post2-cp36-cp36m-linux_x86_64.whl

TejaswiniiB commented 2 years ago

Hi @anantyash9 , I have uninstalled torch and installed as per what you said above, but still memory is being consumed upto 100% and process is topping with error CUDNN_STATUS_EXECUTION_FAILED. Do you know what can be done?

python atcnet.py --pose 1 --relativeframe 0 --dataset news --newsname 19_news/196 --start 0 --model_dir ../model/atcnet_pose0_con3/196/ --continue_train 1 --lr 0.0001 --less_constrain 1 --smooth_loss 1 --smooth_loss2 1 --model_name ../model/atcnet_lstm_general.pth --sample_dir ../sample/atcnet_pose0_con3/196 --device_ids 0 --max_epochs 100
device 0
---------- Networks initialized -------------
[Network] Total number of parameters : 29.431 M
-----------------------------------------------
initialize network with normal
load pretrained [../model/atcnet_lstm_general.pth]
torch.Size([590, 28, 12])
torch.Size([300, 70])
num_steps_per_epoch 17
Traceback (most recent call last):
  File "atcnet.py", line 328, in <module>
    main(config)
  File "atcnet.py", line 306, in main
    t.fit()
  File "/home/tejaswini/SpeechToVideoCloning_2/Audio-driven-TalkingFace-HeadPose/Audio/code/atcnet.py", line 140, in fit
    fake_coeff= self.generator(audio)
  File "/home/tejaswini/SpeechToVideoCloning_2/Audio-driven-TalkingFace-HeadPose/.venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/tejaswini/SpeechToVideoCloning_2/Audio-driven-TalkingFace-HeadPose/Audio/code/models.py", line 102, in forward
    current_feature = self.audio_eocder(current_audio)
  File "/home/tejaswini/SpeechToVideoCloning_2/Audio-driven-TalkingFace-HeadPose/.venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/tejaswini/SpeechToVideoCloning_2/Audio-driven-TalkingFace-HeadPose/.venv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/tejaswini/SpeechToVideoCloning_2/Audio-driven-TalkingFace-HeadPose/.venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/tejaswini/SpeechToVideoCloning_2/Audio-driven-TalkingFace-HeadPose/.venv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/tejaswini/SpeechToVideoCloning_2/Audio-driven-TalkingFace-HeadPose/.venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/tejaswini/SpeechToVideoCloning_2/Audio-driven-TalkingFace-HeadPose/.venv/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 320, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

yiranran / Audio-driven-TalkingFace-HeadPose

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED while running demo_talkingface #35