vskadandale / vocalist

Official repository for the paper VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
Other
60 stars 8 forks source link

4: Padding size should be less than the corresponding input dimension, but got: padding (400, 400) at dimension 2 of input [1, 191488, 2] #2

Closed c1a1o1 closed 2 years ago

c1a1o1 commented 2 years ago

D:\anaconda\python.exe D:/work/vgan/vocalist-main/t1est_lrs2.py use_cuda: True total trainable params 80106561 0it [00:00, ?it/s]Traceback (most recent call last): File "D:/work/vgan/vocalist-main/t1est_lrs2.py", line 290, in eval_model(test_data_loader, device, model) File "D:/work/vgan/vocalist-main/t1est_lrs2.py", line 146, in eval_model for step, (vid, aud, lastframe) in prog_bar: File "D:\anaconda\lib\site-packages\tqdm\std.py", line 1107, in iter for obj in iterable: File "D:\anaconda\lib\site-packages\torch\utils\data\dataloader.py", line 530, in next data = self._next_data() File "D:\anaconda\lib\site-packages\torch\utils\data\dataloader.py", line 570, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "D:\anaconda\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "D:\anaconda\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "D:/work/vgan/vocalist-main/t1est_lrs2.py", line 109, in getitem window=torch.hann_window(hparams.win_size), return_complex=True) File "D:\anaconda\lib\site-packages\torch\functional.py", line 693, in stft input = F.pad(input.view(extended_shape), [pad, pad], pad_mode) File "D:\anaconda\lib\site-packages\torch\nn\functional.py", line 4369, in _pad return torch._C._nn.reflection_pad1d(input, pad) RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (400, 400) at dimension 2 of input [1, 191488, 2] 0it [00:00, ?it/s]

vskadandale commented 2 years ago

You first need to preprocess LRS2 as they do it in Wav2Lip Expert Lip-sync discriminator paper. Please read the read.me of this GitHub repo carefully in case you want to use the code.