Closed cs20162004 closed 3 years ago
The training is corrupted.
Hello. Thank you for your reply!
I have trained GLEAN on ffhq
dataset with upscale factor = 4
for 250,000 iterations. However, the output result still don't look natural. My output looks like the following:
I decreased the learning rate for generator and discriminator by 10, after which the training went stable and the validation PSNR started increased. But I didn't change the loss weights, because I am not sure how to correctly choose them. Do you have any suggestion based on this output? My config file in case if you need: glean_ffhq_4x.txt
May I ask whether you are using the same downsampling method for training and test images?
Thank you for your quick reply!
I use torch.nn.functional.interpolate()
for downsampling, by default it uses nearest mode.
EDIT: Yes the same downsampling method for training and validation images.
Since your learning rate is smaller, you may want to train it for longer, say 600k. You can observe the change of the loss curve and see whether it has converged or not.
Hello @ckkelvinchan . Thank you for your reply!
I am now doing that. I looked at the loss curve in my training and the log that you provided for ffhq training dataset and, until now, they look similar. I was wodering how did you choose 300k iterations. Did you choose it based on your training loss curve? If yes then which loss (pix, perceptual or gan). Because the validation PSNR value doesn't seem to improve much. Thank you!
Hello @ckkelvinchan . Thank you for your reply!
I am now doing that. I looked at the loss curve in my training and the log that you provided for ffhq training dataset and, until now, they look similar. I was wodering how did you choose 300k iterations. Did you choose it based on your training loss curve? If yes then which loss (pix, perceptual or gan). Because the validation PSNR value doesn't seem to improve much. Thank you!
The loss weights are training schemes were not carefully tuned. I think there is a chance that the model is not converged. I think you can continue observing the PSNR value to see the performance will eventually get better.
I think you are already aware that the torch.nn.functional.interpolate()
bicubic method looks very similar to nearest mode. After changing the LR dataset from pytorch bicubic method to matlab bicubic method, validation PSNR increased by ~3.0 from the first iteration. And after training for 60,000 iterations with 5e-5 learning rate for both generator and discriminator, I got similar SR images as the paper.
Thank you for your replies!
The bicubic mode in MATLAB has anti-aliasing, and therefore the LR image would have a better quality when downsampled by the MATLAB bicubic method. It is normal that you have a lower PSNR when your LR is downsampled by nearest downsampling.
Anyway, it is good that similar PSNR can be achieved :)
Hello. Thank you for your work!
I am training GLEAN for
256x256
input size and1024x1024
output size using the same configuration (i.e. glean_ffhq_x16.py) provided by you. I only changed the input size, scaling factor and data path. However, after few iteration my loss becomes too big and decrease again. Like this:My question is: 1) Is it because of the
Adam
optimizer i.e. before converging it shakes little bit? And, can decreasing learning rate solve this issue? 2)loss_d_real
andloss_d_fake
are 0.0000 at some iterations, what could this mean? Thank you for your time!