I use nearly 4w high quality talking head data to train wav2lip288x288. I found that during training, the generated bottom half face always blur. I try to use GAN loss and perceptual loss, but it doesn't help. What should I do. Is the wav2lip model structure has limitation on image quality? Or any loss can improve this?
I use nearly 4w high quality talking head data to train wav2lip288x288. I found that during training, the generated bottom half face always blur. I try to use GAN loss and perceptual loss, but it doesn't help. What should I do. Is the wav2lip model structure has limitation on image quality? Or any loss can improve this?