MingZJU commented 2 years ago

Thanks for your sharing!

I tried your model and find something wrong. Here are the details:

1. when loading wav2lip.pth, "Missing key(s) in state_dict" occured. I made some changes in inference and if solved. Can you please check if it's all right? def load_model(path): model = Wav2Lip() print("Load checkpoint from: {}".format(path)) checkpoint = _load(path)

s = checkpoint["state_dict"]

    # new_s = {}
    # for k, v in s.items():
    #     new_s[k.replace('module.', '')] = v
    model.load_state_dict(checkpoint, False)
    model = model.to(device)
    return model.eval()

2. New error: Using cuda for inference. Reading video frames... Number of frames available for inference: 128 (80, 377) Length of mel chunks: 115 Recovering from OOM error; New batch size: 8 | 0/1 [00:00<?, ?it/s] Load checkpoint from: checkpoints/wav2lip_gan.pth | 0/8 [00:00<?, ?it/s] Model loaded####################################| 15/15 [00:23<00:00, 1.59s/it]

Traceback (most recent call last): File "inference.py", line 373, in main() File "inference.py", line 354, in main pred = model(mel_batch, img_batch) File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/workspace/wav2lip_288x288-ckpt-mismatch/models/wav2lipv2.py", line 117, in forward x = f(x) File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/workspace/wav2lip_288x288-ckpt-mismatch/models/conv2.py", line 16, in forward out = self.conv_block(x) File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 423, in forward return self._conv_forward(input, self.weight) File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: Calculated padded input size per channel: (2 x 2). Kernel size: (3 x 3). Kernel size can't be greater than actual input size

Do you have any suggestions? Thanks in advance.

yihe1003 commented 2 years ago

For the second error: you forget to change the args.img_size = 96 to 288

ghost commented 2 years ago

you need to train from scratch because this is 288x288 version and we changed a lot of things
yihe1003 solved

MingZJU commented 2 years ago

Many thanks @yihe1003 @primepake ! I will try training

primepake / wav2lip_288x288

Kernel size can't be greater than actual input size #7

s = checkpoint["state_dict"]