primepake / wav2lip_288x288

MIT License
524 stars 135 forks source link

the input of lpips loss #130

Closed taozhiqi closed 4 months ago

taozhiqi commented 4 months ago

hi , thanks for the work again, in the training code(the hq_wav2lip_sam_train.py file), i have some questions (1) the value of the image label is 0-1, but the last op of the generate network is nn.tanh, the output is [-1, 1], it does not match, why not use nn.sigmoid?

        self.output_block = nn.Sequential(Conv2d(80, 32, kernel_size=3, stride=1, padding=1),
            nn.Conv2d(32, 3, kernel_size=1, stride=1, padding=0),
            nn.Tanh())

(2) the result of the generate is [0, 1] ,but the input of lpips demands [-1, 1], the link is https://github.com/richzhang/PerceptualSimilarity, the code is

import lpips
loss_fn_alex = lpips.LPIPS(net='alex') # best forward scores
loss_fn_vgg = lpips.LPIPS(net='vgg') # closer to "traditional" perceptual loss, when used for optimization

import torch
img0 = torch.zeros(1,3,64,64) # image should be RGB, **IMPORTANT: normalized to [-1,1]**
img1 = torch.zeros(1,3,64,64)
d = loss_fn_alex(img0, img1)

we should modified the output of the generate to [-1, 1]? (3) if i use the patch gan Discriminator, it will generate the artifact as below, the input of the Discriminator is [-1, 1] or [0. 1]? frame_4 (10)

(4) the most strange question, i found is that the result of the generate seems to construct the down half of the ref image, not the label , even the loss is l1loss(gt, g) ?

look forward to your reply, thanks

ghost commented 4 months ago

Thanks for your interested! This is my mistake because at the begin I used to range [-1,1]. I fixed that change to sigmoid and normalized input of vgg from range [0,1] -> [-1,1]
https://github.com/primepake/wav2lip_288x288/commit/84be153a25572e8cc76dbab2b19c2bd8f34c7adb