Can u finish the TODO: add checkpoints?

tingxingdong commented 4 years ago

https://github.com/krasserm/super-resolution/blob/master/train.py#L132

The GAN model training is long, and easily dies. Without checkpoints, everything has to restart. Thanks.

Fmstrat commented 3 years ago

This is probably going to be a must-have for https://github.com/krasserm/super-resolution/issues/82

krasserm commented 3 years ago

@Fmstrat you can start with saving model weights as described here. EDSR+SRGAN training should stable enough i.e. I cannot confirm that it

easily dies

I have other high priorities at the moment and will implement it when I have more bandwidth. Hope that helps for the moment.

dflateau commented 3 years ago

If we were to hack through it ourselves, would we be trying to minimize discriminator or perceptual loss as a criteria for saving a checkpoint?

As it stands now, when training is complete, the weights that reside in the model is just the result of the last step taken?

tvelk commented 1 year ago

@krasserm I am trying to piece together the SrganTrainer checkpoint criteria and would greatly appreciate your feedback.

Perceptual loss: This is sum of content loss and adversarial loss.
- Content loss: Uses VGG for perceptual similarity instead of pixel-wise losses. Result always positive. It is the generator's goal to minimize.
- Adversarial loss: (10^-3)(SUM(-log([Probability image is natural HR image]))). Result always positive. The higher the probability the images are HR, the smaller the adversarial loss. It is the generator's goal to minimize.
Discriminator loss: Result always positive. It is the discriminator's goal to minimize.
- According to section 4.4. in this tutorial, it is fine when the discriminator overpowers the generator. During SRGAN training, I'm also seeing a low discriminator loss although it's not zero. Originally posted by @krasserm in https://github.com/krasserm/super-resolution/issues/45#issuecomment-596191198

Current implementation of train.py does not see either of these values interact, and are used solely for their respective gradient generation.

With the goal of setting criteria for creation of a checkpoint, it would seem we want to look for a low perceptual loss, and simultaneously high discriminator loss, which is not quite straightforward. Other posts I've seen state that the goal is equilibrium. In which case, a decrease in deviation by x of last y points might be a route to go?

Any help or insight would be greatly appreciated. @dflateau Did you have any luck on this?

krasserm / super-resolution

Can u finish the TODO: add checkpoints? #63