the input of lpips loss

hi , thanks for the work again, in the training code(the hq_wav2lip_sam_train.py file), i have some questions (1) the value of the image label is 0-1, but the last op of the generate network is nn.tanh, the output is [-1, 1], it does not match, why not use nn.sigmoid?

        self.output_block = nn.Sequential(Conv2d(80, 32, kernel_size=3, stride=1, padding=1),
            nn.Conv2d(32, 3, kernel_size=1, stride=1, padding=0),
            nn.Tanh())

(2) the result of the generate is [0, 1] ,but the input of lpips demands [-1, 1], the link is https://github.com/richzhang/PerceptualSimilarity, the code is

import lpips
loss_fn_alex = lpips.LPIPS(net='alex') # best forward scores
loss_fn_vgg = lpips.LPIPS(net='vgg') # closer to "traditional" perceptual loss, when used for optimization

import torch
img0 = torch.zeros(1,3,64,64) # image should be RGB, **IMPORTANT: normalized to [-1,1]**
img1 = torch.zeros(1,3,64,64)
d = loss_fn_alex(img0, img1)

we should modified the output of the generate to [-1, 1]? (3) if i use the patch gan Discriminator, it will generate the artifact as below, the input of the Discriminator is [-1, 1] or [0. 1]? frame_4 (10)

(4) the most strange question, i found is that the result of the generate seems to construct the down half of the ref image, not the label , even the loss is l1loss(gt, g) ?

look forward to your reply, thanks

primepake / wav2lip_288x288

the input of lpips loss #130