zhanglonghao1992 / One-Shot_Free-View_Neural_Talking_Head_Synthesis

Pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"
Other
764 stars 143 forks source link

About 512size-20kp training problem #27

Open Berlin0610 opened 2 years ago

Berlin0610 commented 2 years ago

Dear Longhao,

Thank you for this great source code. During my training with 512512 size with Voxceleb2, I have been into trouble with this problem as follows: When this model is trained with 10kp, the training visualization of different channels is right. With the keypoint number is increasing to 15, even to 20, the training visualization of different channels cannot have a good learning mapping. So I would like to ask you why about this. In addition, when this model is trained with 256256 size with Voxceleb2, the visualization channel is right. It is very strange.

Looking forward to your kind reply at your convenience. Thank you.

Best regards, Berlin

图片2

zhanglonghao1992 commented 2 years ago

@Berlin0610 It looks strange... Maybe you can try setting the scale_factor of kp_detector_params in the config to 0.25 or 0.125, and see if the detection results of key-points will be reasonable. To be honest, I haven't trained the model at 512x512 resolution. Right now, I mainly focus on improving the generator to get lower training perceptual loss and better image quality.

sumandeepb commented 4 months ago

@Berlin0610 could you share the 512 size config file you used ?

I know its been a long time, but if by any chance you have it, would be much appreciated.