primepake / wav2lip_288x288

MIT License
530 stars 136 forks source link

When I train on hq_wav2lip_train.py I am getting assertion error. #74

Closed shahidmuneer closed 7 months ago

shahidmuneer commented 10 months ago

I have generated syncnet weights by adding the ReLU layer at the end of the syncnet model. I am using the weights to train the wav2lip generator and discriminator network. However, I am getting assertion error as follows: Evaluating for 300 steps ../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [0,0,0], thread: [3,0,0] Assertion input_val >= zero && input_val <= one failed. ../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [0,0,0], thread: [5,0,0] Assertion input_val >= zero && input_val <= one failed. ../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [0,0,0], thread: [7,0,0] Assertion input_val >= zero && input_val <= one failed. L1: 0.24381156265735626, Sync: 0.0, Percep: 0.7179633975028992 | Fake: 0.6689321398735046, Real: 0.7179633378982544: : 1it [00:07, 7.28s/it] Traceback (most recent call last): File "hq_wav2lip_train.py", line 442, in nepochs=hparams.nepochs) File "hq_wav2lip_train.py", line 286, in train average_sync_loss = eval_model(test_data_loader, global_step, device, model, disc) File "hq_wav2lip_train.py", line 326, in eval_model perceptual_loss = disc.perceptual_forward(g) File "/home/akool/shahid/wav2lip_288x288/models/wav2lipv2.py", line 196, in perceptual_forward false_feats = f(false_feats) File "/home/akool/anaconda3/envs/wav2lip2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, kwargs) File "/home/akool/anaconda3/envs/wav2lip2/lib/python3.7/site-packages/torch/nn/modules/container.py", line 204, in forward input = module(input) File "/home/akool/anaconda3/envs/wav2lip2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, *kwargs) File "/home/akool/shahid/wav2lip_288x288/models/conv2.py", line 30, in forward out = self.conv_block(x) File "/home/akool/anaconda3/envs/wav2lip2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(input, kwargs) File "/home/akool/anaconda3/envs/wav2lip2/lib/python3.7/site-packages/torch/nn/modules/container.py", line 204, in forward input = module(input) File "/home/akool/anaconda3/envs/wav2lip2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/akool/anaconda3/envs/wav2lip2/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/akool/anaconda3/envs/wav2lip2/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 460, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

It looks like the error is only happening in the evaluation stage. Rest of the training remains fine.

Can I get any help ?

ghost commented 7 months ago

you should replace the last layer with relu in both audio encoder and face encoder of syncnet