Open Aithu-Snehith opened 4 years ago
I am getting the same error
python atcnet.py --pose 1 --relativeframe 0 --dataset news --newsname 19_news/31 --start 0 --model_dir ../model/atcnet_pose0_con3/31/ --continue_train 1 --lr 0.0001 --less_constrain 1 --smooth_loss 1 --smooth_loss2 1 --model_name ../model/atcnet_lstm_general.pth --sample_dir ../sample/atcnet_pose0_con3/31 --device_ids 0 --max_epochs 100
device 0
---------- Networks initialized -------------
[Network] Total number of parameters : 29.431 M
-----------------------------------------------
Traceback (most recent call last):
File "atcnet.py", line 328, in <module>
main(config)
File "atcnet.py", line 305, in main
t = trainer.Trainer(config)
File "/content/Audio-driven-TalkingFace-HeadPose/Audio/code/atcnet.py", line 81, in __init__
self.generator = self.generator.cuda()
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 265, in cuda
return self._apply(lambda t: t.cuda(device))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 193, in _apply
module._apply(fn)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py", line 127, in _apply
self.flatten_parameters()
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py", line 123, in flatten_parameters
self.batch_first, bool(self.bidirectional))
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
Changing the torch version fixed this
!pip uninstall torch
!pip install https://download.pytorch.org/whl/cu100/torch-1.0.1.post2-cp36-cp36m-linux_x86_64.whl
Hi @anantyash9 , I have uninstalled torch and installed as per what you said above, but still memory is being consumed upto 100% and process is topping with error CUDNN_STATUS_EXECUTION_FAILED
. Do you know what can be done?
python atcnet.py --pose 1 --relativeframe 0 --dataset news --newsname 19_news/196 --start 0 --model_dir ../model/atcnet_pose0_con3/196/ --continue_train 1 --lr 0.0001 --less_constrain 1 --smooth_loss 1 --smooth_loss2 1 --model_name ../model/atcnet_lstm_general.pth --sample_dir ../sample/atcnet_pose0_con3/196 --device_ids 0 --max_epochs 100
device 0
---------- Networks initialized -------------
[Network] Total number of parameters : 29.431 M
-----------------------------------------------
initialize network with normal
load pretrained [../model/atcnet_lstm_general.pth]
torch.Size([590, 28, 12])
torch.Size([300, 70])
num_steps_per_epoch 17
Traceback (most recent call last):
File "atcnet.py", line 328, in <module>
main(config)
File "atcnet.py", line 306, in main
t.fit()
File "/home/tejaswini/SpeechToVideoCloning_2/Audio-driven-TalkingFace-HeadPose/Audio/code/atcnet.py", line 140, in fit
fake_coeff= self.generator(audio)
File "/home/tejaswini/SpeechToVideoCloning_2/Audio-driven-TalkingFace-HeadPose/.venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/tejaswini/SpeechToVideoCloning_2/Audio-driven-TalkingFace-HeadPose/Audio/code/models.py", line 102, in forward
current_feature = self.audio_eocder(current_audio)
File "/home/tejaswini/SpeechToVideoCloning_2/Audio-driven-TalkingFace-HeadPose/.venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/tejaswini/SpeechToVideoCloning_2/Audio-driven-TalkingFace-HeadPose/.venv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/tejaswini/SpeechToVideoCloning_2/Audio-driven-TalkingFace-HeadPose/.venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/tejaswini/SpeechToVideoCloning_2/Audio-driven-TalkingFace-HeadPose/.venv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/tejaswini/SpeechToVideoCloning_2/Audio-driven-TalkingFace-HeadPose/.venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/tejaswini/SpeechToVideoCloning_2/Audio-driven-TalkingFace-HeadPose/.venv/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 320, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
I encountered the below error while running the following cell
!cd Audio/code/; python train_19news_1.py 31 0
The error is while running on the sample video given itself. ('Data/31.mp4').
The full Traceback is as follow:
How can I overcome this.