nagadomi / waifu2x

Image Super-Resolution for Anime-Style Art
http://waifu2x.udp.jp/
MIT License
27.56k stars 2.71k forks source link

Reimplement does not give idea result #35

Closed xhwang closed 9 years ago

xhwang commented 9 years ago

@nagadomi
Recently I try to reproduce your scale2x work with Caffe. The network is set up as closely with waifu as possible: 128x128 input, 114x114 output, LeakyReLu, MSE loss, For now, Solver using SGD not Adam in your implement.

Training data (5000 images) generated by waifu code. Batch size 2, train 100,000 iterations. Base learning rate 0.00025, update learning rate using caffe 'inv' method.

Loss gets to 0.0020 training loss (worse than waifu gives 0.00035~0.00028). I am not sure whether the network parameters converged becuase the loss begins to vibrate around 0.0020 from 10,000 iteration

Test image result is not as good as yours. The result is a little bit blurry. As follows

input waifu caffe

Do you have any ideas about this? Is it that the solver parameter need more fine-tune? Or a better solver method e.g Adam is essential?

Looking forward to your reply. Sorry for that it is not a develop issue ~~

Zotikus1001 commented 9 years ago

It looks like you need to feed it high quality images for training..

xhwang commented 9 years ago

@BrokenSilence Thanks for your reply. The question is that the training data fed to Caffe and waifu are the same(generated from VOC 2007 dataset), however waifu can get better result.

nagadomi commented 9 years ago

I tried SGD at first but I didn't get good result. In this task, Adam is pretty better than other optimizers, I think.

Original SRCNN uses SGD. Probably initial weights and layer-wise learning rate settings is important. (I was not able to reproduce this paper with SGD)

The filter weights of each layer are initialized by drawing randomly from a Gaussian distribution with zero mean and standard deviation 0.001 (and 0 for biases). The learning rate is 10^4 for the first two layers, and 10^5 for the last layer. We empirically find that a smaller learning rate in the last layer is important for the network to converge (similar to the denoising case [22]).

xhwang commented 9 years ago

@nagadomi Many thanks! I will try to use Adam to train a network. The result will be updated as soon as possible. :-)

nagadomi commented 9 years ago

The loss(MSE) of waifu2x(in 2x scaling) is 0.00035~0.00028. not 0.0020. EDIT: RGB values is scaled 0.0~1.0.

xhwang commented 9 years ago

Got it, RGB values has been scaled.

xhwang commented 9 years ago

Adam Solver really helps. Now it achieves comparable results. :+1: