nagadomi / waifu2x

Image Super-Resolution for Anime-Style Art
http://waifu2x.udp.jp/
MIT License
27.34k stars 2.71k forks source link

[suggest] Residual Learning? #143

Open azurespace opened 7 years ago

azurespace commented 7 years ago

Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

http://arxiv.org/abs/1608.03981

With residual connection, we can train very deep, deeper convolution network. The paper uses the residual training to train a network that can perform any kinds of gaussian denoising.

The super-resolution task is also a type of denoiser. so i think it is triable to test somewhat similar to the model in this paper.

nagadomi commented 7 years ago

Generally speaking, a very deep net is very slow. Maybe we cannot use it on the web service. But for offline use, We can implement it.

azurespace commented 7 years ago

yes, it would be slower. but, as far as I know, waifu2x support only 2x scale so if we want more, we should apply it iteratively.

However, a very deep network might do much better for 3x,4x or larger if we train considerately. well, I'm not sure.

chungexcy commented 7 years ago

@nagadomi ResNet style (from Kaiming He) network usually generates better performance than Non-Res CNN. @azurespace However. I don't think learning the "Residual" can lead to a better performance and comparable speed. Using Deconvolution to generate high resolution has been proven better than Bicubic upscaling based methods.

Deeper Network doesn't always mean slower. For example, ResNet uses 101-layer but very small kernels and channels, yielding much lower error rate and similar forward speed, compared to VGG-19 (https://github.com/jcjohnson/cnn-benchmarks).

Recently, I found Twitter Cortex / Magic Pony posted their SRResNet on Arxiv. (Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network). They combined Deconvolution and ResNet, designed a 4x upscalling network, and got the best PSNR performance (according to my research on many recent works). The Network is low resolution based, 12 layers convolution with 3x3x64x64. this network should also run very fast.

Besides, this paper is very good and points out another interesting solution GAN (Generative Adversarial Network) other than transitional MSE-based super resolution. I personally believe this is a very worth-reading paper, and you may be interested in this kind of network designing (SRResNet), even without the GAN.

nagadomi commented 7 years ago

Thanks for info.

For the moment, I tried to benchmark 4x with waifu2x models (apply 2x iteratively). 4x-PSNR

Dataset/Model Bucubic upconv_7/photo upconv_7l/photo DnCNN-3 upconv_7l/photo TTA SRResNet
BSD100 25.975 27.201 27.282 27.29 27.332 27.52

Note: training dataset of each models are different.

SRResNet looks really high performance.

nagadomi commented 7 years ago

I've tried to train SRResNet-like 2x model but it did not get any improvement compared with upconv_7l. PSNR is very close to upconv_7l.

-- in lib/srcnn.lua
local function ReLU(backend)
   if backend == "cunn" then
      return nn.ReLU(true)
   elseif backend == "cudnn" then
      return cudnn.ReLU(true)
   else
      error("unsupported backend:" .. backend)
   end
end
function srcnn.srresnet_2x(backend, ch)
  -- note: no batchnormalization, no zero padding
   local function resblock(backend)
      local seq = nn.Sequential()
      local con = nn.ConcatTable()
      local conv = nn.Sequential()

      conv:add(SpatialConvolution(backend, 64, 64, 3, 3, 1, 1, 0, 0))
      conv:add(ReLU(backend))
      conv:add(SpatialConvolution(backend, 64, 64, 3, 3, 1, 1, 0, 0))
      conv:add(ReLU(backend))

      con:add(conv)
      con:add(nn.SpatialZeroPadding(-2, -2, -2, -2)) -- identify + de-padding
      --con:add(nn.Identity())
      seq:add(con)
      seq:add(nn.CAddTable())
      return seq
   end
   local model = nn.Sequential()
   model:add(SpatialConvolution(backend, ch, 64, 3, 3, 1, 1, 0, 0))
   model:add(ReLU(backend))
   model:add(resblock(backend))
   model:add(resblock(backend))
   model:add(resblock(backend))
   model:add(resblock(backend))
   model:add(resblock(backend))
   model:add(resblock(backend))

   model:add(SpatialFullConvolution(backend, 64, 64, 4, 4, 2, 2, 2, 2))
   model:add(ReLU(backend))
   model:add(SpatialConvolution(backend, 64, 3, 3, 3, 1, 1, 0, 0))

   model:add(w2nn.InplaceClip01())
   model:add(nn.View(-1):setNumInputDims(3))
   model.w2nn_arch_name = "srresnet_2x"
   model.w2nn_offset = 28
   model.w2nn_scale_factor = 2
   model.w2nn_resize = true
   model.w2nn_channels = ch

   --model:cuda()
   --print(model:forward(torch.Tensor(16, ch, 96, 96):uniform():cuda()):size())

   return model
end
azurespace commented 7 years ago

What do you think about SRGAN? GANs usually show lower PSNR but it has ability to construct lost details.(appealing to human vision, but not exactly same as the original)

I think the results from SRGAN-VGG54 are very amazing!(in the SRResNet paper)

nagadomi commented 7 years ago

I think that is very interesting results. But it seems that the result of SRGAN contains artifacts, and it breaks the original image. I think they cannot substitute for bicubic.

azurespace commented 7 years ago

hm, maybe this article can be a hint.

Deconvolution and Checkerboard Artifacts

http://distill.pub/2016/deconv-checkerboard/

It covers how to remove the artifacts that (de)convolutional neural networks tend to create.

nagadomi commented 7 years ago

That is very useful link. We have the same problem #125 . But the proposed solution resize-convolution is the same as waifu2x's vgg_7 method(old model). It's 2~4x slower than deconvolution.

EDIT: this is my misunderstanding. deconvolution -> upsampling+convolution, it's not so slow.

XiaHua-Spark commented 7 years ago

I tried to add the code to the lib/srcnn.lua file, then I try to train with

th train.lua -backend cudnn -model srresnet_2x -model_dir models/my_model -method noise_scale -noise_level  -style photo -batch_size 32 -thread 4 -max_size 512

and it does not work, is there any other code need to be changed? I have down training on my dataset, so other settings should be correct. The error is:

/home/xia/torch/install/bin/luajit: lib/srcnn.lua:309: attempt to call field 'InplaceClip01' (a nil value)
stack traceback:
    lib/srcnn.lua:309: in function <lib/srcnn.lua:276>
    lib/srcnn.lua:336: in function 'create'
    train.lua:302: in function 'train'
    train.lua:460: in main chunk
    [C]: in function 'dofile'
    .../xia/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00405d50

also -model upcov8_4x doesn't works. Thanks in advance!

nagadomi commented 7 years ago

InplaceClip01 is only available in dev branch.

git checkout -b dev origin/dev

I used the following train command for srresnet_2x.

th train.lua -save_history 1 -model srresnet_2x -scale 2 -downsampling_filters "Box,Sinc,Catrom" -data_dir ./data/photo -model_dir models/test/srresnet_photo3 -test query/machine.png -color rgb -random_unsharp_mask_rate 0.1 -thread 3 -backend cudnn -oracle_rate 0.0 -random_blur_rate 0.025 -epoch 100 -inner_epoch 2 -max_size 256 -crop_size 72 -batch_size 16

(You should change -data_dir and -test option for your environment.)

For benchmark, see https://github.com/nagadomi/waifu2x/blob/dev/appendix/benchmark.md And I used -method scale4 option for 4x benchmark.

-model upconv8_4x

If you want to train 4x, -scale 4 option is required. but 4x training may not work currently.

myw8 commented 7 years ago

can I use caffe train the protext and get caffemodel ,use matlab test the model? how to do

nagadomi commented 7 years ago

waifu2x-caffe has waifu2x's vgg_7 and upconv_7 models that are converted for caffe. See ./bin/models on released binary. If you want to test the photo model, use bin/models/upconv_7_photo/scale2.0x_model.*.

EDIT: chungexcy's caffe implementation may helps. https://github.com/chungexcy/waifu2x-new. But updating pretrained model is required. (update json file from models/.)

myw8 commented 7 years ago

Thank you very much I have just realized it use matlab refer to chungexcy's caffe. Now I want to use caffe train the prototxt and get the caffemodel by myself , I have two question : first Is the input data the original image and the label is enlarged and denoise image? Second can you tell me where I can get the data images. Thank you!

nagadomi commented 7 years ago

first Is the input data the original image and the label is enlarged and denoise image?

No. For upscaling, input=1/2 resized image, label=original image. For denosing, input=jpeg compressed image, label=original image.

Second can you tell me where I can get the data images.

For testing, I recommend to use ukbench. If you wan to get fanart images, you should collect it yourself.

chungexcy commented 7 years ago

@myw8 If you want to train the model using your own data-sets, you can download the example training instruction of SRCNN from here. You will need to change the HDF5 initialization to match waifu2x model input/output requirement.

nagadomi commented 7 years ago

alexjc released neural-enhance. It seems that a non MSE based method. I've tried that.

(click to see)

original image: machine

ne4: machine_ne4x

waifu2x upconv_7l/photo: w4

original: bsd500

ne4: bsd500_ne4x

waifu2x upconv_7l/photo: b4

neural-enhance seems to generate high-frequency components. and it changes the original color/brightness.

myw8 commented 7 years ago

Yes thank you very much