nagadomi / waifu2x

Image Super-Resolution for Anime-Style Art
http://waifu2x.udp.jp/
MIT License
27.49k stars 2.71k forks source link

Two questions about training process #108

Open buggyyang opened 8 years ago

buggyyang commented 8 years ago
  1. Your model was trained on a 6000-image dataset. So for each image, how did you crop them?
  2. What's your training/validation accuracy after you finished training the model?
nagadomi commented 8 years ago
  1. No crop. (but training code randomly crops image from original size to 256x256 subimages -> 46x46 patches in each iteration)
  2. I don't have logs. Bechmark results can be found at Benchmark results. Unfortunately that benchmark dataset is closed data.
buggyyang commented 8 years ago

Thanks for your answer. In fact, I'm a beginner of Deep learning and struggling to finish my final project of machine learning course. I tried to follow the instruction of the paper of Chao Dong using 3 conv layers. How should I tune my hyperparameter? I have tried to use the parameter given by Dong, but some of the predicted images had noisy points. Could you give me some suggestions? Thank you. For my training set, I used 1000 256x256 images from ImageNet and croped each image as 64 32x32 images(not randomly).

nagadomi commented 8 years ago

I think SRCNN based neural networks is hard to optimize. I used Adam(arxiv, Torch implementatin) to optimize waifu2x's model. Adam is easily to converge this task than mometum-SGD, in my experience.

buggyyang commented 8 years ago

Yep, I've read part your code and I also tried the Adam for my gradient descent. But Adam is recommended to use its default learning rate, so I am not sure whether I should change it a little bit for SRCNN. Another question is about your LeakyReLU activation, does that dramatically improve your performance than ReLU? Finally, I want to ask for an efficient way of tuning the parameter(using a small number of training data or maybe using a small number of epoch?) Thanks a lot.

nagadomi commented 8 years ago

LeakyReLU does not dramatically improve performance in this small network(3 layers or 7 layers). I used small dataset for debug(and little tuning), and full dataset for tuning hyperprameters.

I think the most important things in this task is to make pairwise teacher data correctly. 4_x 4_y If training code has bug in generating pairwise data, trained model will generate low quaily images.

buggyyang commented 8 years ago

Anyway, thank u a lot. So do I need to keep the learning rate of Adam as default 0.001? Or tune it a little bit?

nagadomi commented 8 years ago

I used low learning rate 0.0005 ~ 0.00001 (with learning rate decay).

buggyyang commented 8 years ago

How should I set my batch size? You said that it helps if we set it 2~4. How about making it a larger number? The efficiency was really low when I used size 4.

nagadomi commented 8 years ago

current my settings: input: 3x46x46 output: 3x32x32 (convolution layer does not have padding, so output size is smaller than input size.) image pixel value: 0.0-1.0 (not 0-255) optimizer: Adam batch_size: 8 learning_rate: 0.0005 loss: Huber loss (MSE is sufficient though) weight initializer: He (https://arxiv.org/abs/1502.01852)

buggyyang commented 8 years ago

What's the different when you deal with anime pictures and realistic photos respectively? For a Gaussian kernel to smooth the picture, how should I choose the parameter? I've finished my SRCNN model, but my result is not as ideal as I hope.

nagadomi commented 8 years ago

That is learned by the neural network. waifu2x has the photo model(models/photo). It was trained by the same code as anime style art. (http://waifu2x.udp.jp/ Style=Photo)

Did you use gaussian blur to generate dataset? I think that the original SRCNN uses bicubic interplation. See SRNN_train.zip(./generate_train.m) http://mmlab.ie.cuhk.edu.hk/projects/SRCNN.html

leilei- commented 8 years ago

Very off-topic, but i'm also curious on the training when using nearest neighbor interpolation to upscale pixel art. There's this "MagicPony Technology" neural network image improver out there getting some hype lately for doing just that at attempting to make realistic interpretaions of pixel art or pixelized art. I also wonder how that could apply to say, dithery PC-98 pictures (in which traditional pixel filters do a bad job on)

nagadomi commented 8 years ago

I guess MagicPony is something like Deep Dream or Perceptual Loss. It is able to generate high resolution textures from (memorized) training images. and, I am thinking about developing a inverse dithering(color reduction) filter but that is a very low priority task for me at the moment.

buggyyang commented 8 years ago

SRCNN is also used for denoising? Or you did something else?

nagadomi commented 8 years ago

The model of waifu2x is not the same as SRCNN. It's VGG style, 7 layers CNN. waifu2x uses that model for denoising and upscaling. And I think that deblurring task is more difficult than denoising task.

ProGamerGov commented 7 years ago

@nagadomi What was the resolution of the images used in your training data? How does the resolution of the training data affect training an art model, and what is the best resolution for training data? Do I need to find images with sizes like 2k, or 4k for my data set? Or does Waifu2x not require such a high image size for successful training?

buggyyang commented 7 years ago

@ProGamerGov Nope, SRCNN only learns how to increase the resolution, like a filter. Once you have the filter, you can increase the resolution of any image. Once you know how to drive, you'll never mind you drive BMW or Benz or Landrover.

ProGamerGov commented 7 years ago

@cdyrhjohn , so is there a limit on the size/resolution of the training data I can use? Will higher resolutions provide better results?

buggyyang commented 7 years ago

@ProGamerGov no limit; no better result, but much more training time; waifu2x use 46x46 resolution dataset.