Female model training cannot converge

royorel / Lifespan_Age_Transformation_Synthesis

Lifespan Age Transformation Synthesis code

Other

582 stars 132 forks source link

Female model training cannot converge #33

Closed awei669 closed 2 years ago

awei669 commented 2 years ago

Hello, when trying to reproduce the model, the male model starts to converge after training to 80 epochs, and gets better results when training to 400 epochs. At the same time, the female model was trained according to the same parameters, but there was no convergence trend until the 380th epoch. Does the female model have any special modifications in the training stage? Look forward to your reply!

royorel commented 2 years ago

The training process can be unstable, try to restart training for the female model. If the problem persists, you can lower the learning rate a little bit.

awei669 commented 2 years ago

Thank you for your quick reply. I just adjust the learning rate to half of the original and train on V100 again. Is this reduction too exaggerated? If so, how much should I lower?

Due to the limitation of training resources, forgive me for asking questions that don't seem too smart!

royorel commented 2 years ago

Halving the learning rate should be ok, you can even go as low as 1/10 of the original learning rate.

awei669 commented 2 years ago

Hello, after adjusting the learning rate to half, the female model tends to converge at about 100 epoch, but the image displayed on visdom is not good. At the same time, we use the same learning rate to learn on another machine. Unlike the former, this time we do not have download pre trained models. This time, there is a convergence trend in 50epoch, and we find that the real-time image effect of visdom is very good.

Is this the same as running download_ models. After py, the pre trained RESNET model is used, because according to the experimental results, the pre trained model (backbone network RESNET) can converge quickly and achieve good results in the learning process. See the figure below

Do you use several RESNET networks trained in advance in the training stage?

royorel commented 2 years ago

We only use a single ResNet during training. The training process itself can be unstable sometimes, just like every other GAN. The convergence on the right model happened at epoch 50 when the learning rate as dropped. We didn't save the logs for the pre-trained model, so I can't tell you if the loss curve looks like that or not. I would recommend to let the model on the right finish training and see if the results are satisfactory.