Open azurespace opened 7 years ago
Generally speaking, a very deep net is very slow. Maybe we cannot use it on the web service. But for offline use, We can implement it.
yes, it would be slower. but, as far as I know, waifu2x support only 2x scale so if we want more, we should apply it iteratively.
However, a very deep network might do much better for 3x,4x or larger if we train considerately. well, I'm not sure.
@nagadomi ResNet style (from Kaiming He) network usually generates better performance than Non-Res CNN. @azurespace However. I don't think learning the "Residual" can lead to a better performance and comparable speed. Using Deconvolution to generate high resolution has been proven better than Bicubic upscaling based methods.
Deeper Network doesn't always mean slower. For example, ResNet uses 101-layer but very small kernels and channels, yielding much lower error rate and similar forward speed, compared to VGG-19 (https://github.com/jcjohnson/cnn-benchmarks).
Recently, I found Twitter Cortex / Magic Pony posted their SRResNet on Arxiv. (Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network). They combined Deconvolution and ResNet, designed a 4x upscalling network, and got the best PSNR performance (according to my research on many recent works). The Network is low resolution based, 12 layers convolution with 3x3x64x64. this network should also run very fast.
Besides, this paper is very good and points out another interesting solution GAN (Generative Adversarial Network) other than transitional MSE-based super resolution. I personally believe this is a very worth-reading paper, and you may be interested in this kind of network designing (SRResNet), even without the GAN.
Thanks for info.
For the moment, I tried to benchmark 4x with waifu2x models (apply 2x iteratively). 4x-PSNR
Dataset/Model | Bucubic | upconv_7/photo | upconv_7l/photo | DnCNN-3 | upconv_7l/photo TTA | SRResNet |
---|---|---|---|---|---|---|
BSD100 | 25.975 | 27.201 | 27.282 | 27.29 | 27.332 | 27.52 |
Note: training dataset of each models are different.
SRResNet looks really high performance.
I've tried to train SRResNet-like 2x model but it did not get any improvement compared with upconv_7l. PSNR is very close to upconv_7l.
-- in lib/srcnn.lua
local function ReLU(backend)
if backend == "cunn" then
return nn.ReLU(true)
elseif backend == "cudnn" then
return cudnn.ReLU(true)
else
error("unsupported backend:" .. backend)
end
end
function srcnn.srresnet_2x(backend, ch)
-- note: no batchnormalization, no zero padding
local function resblock(backend)
local seq = nn.Sequential()
local con = nn.ConcatTable()
local conv = nn.Sequential()
conv:add(SpatialConvolution(backend, 64, 64, 3, 3, 1, 1, 0, 0))
conv:add(ReLU(backend))
conv:add(SpatialConvolution(backend, 64, 64, 3, 3, 1, 1, 0, 0))
conv:add(ReLU(backend))
con:add(conv)
con:add(nn.SpatialZeroPadding(-2, -2, -2, -2)) -- identify + de-padding
--con:add(nn.Identity())
seq:add(con)
seq:add(nn.CAddTable())
return seq
end
local model = nn.Sequential()
model:add(SpatialConvolution(backend, ch, 64, 3, 3, 1, 1, 0, 0))
model:add(ReLU(backend))
model:add(resblock(backend))
model:add(resblock(backend))
model:add(resblock(backend))
model:add(resblock(backend))
model:add(resblock(backend))
model:add(resblock(backend))
model:add(SpatialFullConvolution(backend, 64, 64, 4, 4, 2, 2, 2, 2))
model:add(ReLU(backend))
model:add(SpatialConvolution(backend, 64, 3, 3, 3, 1, 1, 0, 0))
model:add(w2nn.InplaceClip01())
model:add(nn.View(-1):setNumInputDims(3))
model.w2nn_arch_name = "srresnet_2x"
model.w2nn_offset = 28
model.w2nn_scale_factor = 2
model.w2nn_resize = true
model.w2nn_channels = ch
--model:cuda()
--print(model:forward(torch.Tensor(16, ch, 96, 96):uniform():cuda()):size())
return model
end
What do you think about SRGAN? GANs usually show lower PSNR but it has ability to construct lost details.(appealing to human vision, but not exactly same as the original)
I think the results from SRGAN-VGG54 are very amazing!(in the SRResNet paper)
I think that is very interesting results. But it seems that the result of SRGAN contains artifacts, and it breaks the original image. I think they cannot substitute for bicubic.
hm, maybe this article can be a hint.
http://distill.pub/2016/deconv-checkerboard/
It covers how to remove the artifacts that (de)convolutional neural networks tend to create.
That is very useful link. We have the same problem #125 .
But the proposed solution resize-convolution
is the same as waifu2x's vgg_7 method(old model).
It's 2~4x slower than deconvolution.
EDIT: this is my misunderstanding. deconvolution -> upsampling+convolution, it's not so slow.
I tried to add the code to the lib/srcnn.lua
file, then I try to train with
th train.lua -backend cudnn -model srresnet_2x -model_dir models/my_model -method noise_scale -noise_level -style photo -batch_size 32 -thread 4 -max_size 512
and it does not work, is there any other code need to be changed? I have down training on my dataset, so other settings should be correct. The error is:
/home/xia/torch/install/bin/luajit: lib/srcnn.lua:309: attempt to call field 'InplaceClip01' (a nil value)
stack traceback:
lib/srcnn.lua:309: in function <lib/srcnn.lua:276>
lib/srcnn.lua:336: in function 'create'
train.lua:302: in function 'train'
train.lua:460: in main chunk
[C]: in function 'dofile'
.../xia/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405d50
also -model upcov8_4x
doesn't works.
Thanks in advance!
InplaceClip01
is only available in dev
branch.
git checkout -b dev origin/dev
I used the following train command for srresnet_2x.
th train.lua -save_history 1 -model srresnet_2x -scale 2 -downsampling_filters "Box,Sinc,Catrom" -data_dir ./data/photo -model_dir models/test/srresnet_photo3 -test query/machine.png -color rgb -random_unsharp_mask_rate 0.1 -thread 3 -backend cudnn -oracle_rate 0.0 -random_blur_rate 0.025 -epoch 100 -inner_epoch 2 -max_size 256 -crop_size 72 -batch_size 16
(You should change -data_dir
and -test
option for your environment.)
For benchmark, see https://github.com/nagadomi/waifu2x/blob/dev/appendix/benchmark.md
And I used -method scale4
option for 4x benchmark.
-model upconv8_4x
If you want to train 4x, -scale 4
option is required. but 4x training may not work currently.
can I use caffe train the protext and get caffemodel ,use matlab test the model? how to do
waifu2x-caffe has waifu2x's vgg_7
and upconv_7
models that are converted for caffe.
See ./bin/models
on released binary. If you want to test the photo model, use bin/models/upconv_7_photo/scale2.0x_model.*
.
EDIT: chungexcy's caffe implementation may helps. https://github.com/chungexcy/waifu2x-new. But updating pretrained model is required. (update json file from models/.)
Thank you very much I have just realized it use matlab refer to chungexcy's caffe. Now I want to use caffe train the prototxt and get the caffemodel by myself , I have two question : first Is the input data the original image and the label is enlarged and denoise image? Second can you tell me where I can get the data images. Thank you!
first Is the input data the original image and the label is enlarged and denoise image?
No. For upscaling, input=1/2 resized image, label=original image. For denosing, input=jpeg compressed image, label=original image.
Second can you tell me where I can get the data images.
For testing, I recommend to use ukbench. If you wan to get fanart images, you should collect it yourself.
@myw8 If you want to train the model using your own data-sets, you can download the example training instruction of SRCNN from here. You will need to change the HDF5 initialization to match waifu2x model input/output requirement.
alexjc released neural-enhance. It seems that a non MSE based method. I've tried that.
(click to see)
original image:
ne4:
waifu2x upconv_7l/photo:
original:
ne4:
waifu2x upconv_7l/photo:
neural-enhance seems to generate high-frequency components. and it changes the original color/brightness.
Yes thank you very much
Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising
http://arxiv.org/abs/1608.03981
With residual connection, we can train very deep, deeper convolution network. The paper uses the residual training to train a network that can perform any kinds of gaussian denoising.
The super-resolution task is also a type of denoiser. so i think it is triable to test somewhat similar to the model in this paper.