nagadomi / waifu2x

Image Super-Resolution for Anime-Style Art
http://waifu2x.udp.jp/
MIT License
27.45k stars 2.71k forks source link

Advanced Machine Learning #231

Open DonaldTsang opened 6 years ago

DonaldTsang commented 6 years ago
  1. Is it possible to replace caffe (the slowest in the Python platform) with PyTorch (fastest overall) or MXNet (can beat PyTorch in parallel GPUs)
  2. Is it possible to replace VGG7 with Inception or ResNet, which out-performs VGG7?
DonaldTsang commented 6 years ago
DonaldTsang commented 6 years ago

Some idea: categorize images in the database into "pure", "single-JPG", "double-JPG", "multi-JPG" (JPG as in JPG compression).
Use that as the metric to how "noisy" an image is, and then proceed to apply the right amount of de-noising to not over-shoot.
Only the "pure" images should be used as the base dataset for testing reverse image compression and compression.
Reference: https://www.politesi.polimi.it/bitstream/10589/132721/1/2017_04_Chen.pdf

nagadomi commented 6 years ago

Is it possible to replace caffe (the slowest in the Python platform) with PyTorch (fastest overall) or MXNet (can beat PyTorch in parallel GPUs)

waifu2x is implemented in LuaJIT/Torch, not Caffe. Torch already seems to outdated, it is good to switch to PyTorch, but for now I don't have resource to do it. tsurumeso has released the chainer version. https://github.com/tsurumeso/waifu2x-chainer

Is it possible to replace VGG7 with Inception or ResNet, which out-performs VGG7?

ResNet model is already found in dev branch. benchmark: https://github.com/nagadomi/waifu2x/blob/dev/appendix/benchmark.md Unfortunately it is much slower than the current model, so it can not be used in web services.

Some idea: categorize images in the database into "pure", "single-JPG", "double-JPG", "multi-JPG" (JPG as in JPG compression).

It has already been realized. waifu2x can specify JPEG quality and compression times for real-time data augmentation at training. The dataset has been constructed with images that is not JPEG compressed.

DonaldTsang commented 6 years ago

@nagadomi

Unfortunately it is much slower than the current model

Maybe reduce the size of the ResNet by using less modules? And compare that with VGG5/7/9/16/19 to create a graph of epoch training speed compared to total training time and accuracy?

waifu2x can specify JPEG quality

what about auto-detection of JPEG quality? Could that be implemented as well?

nagadomi commented 6 years ago

Maybe reduce the size of the ResNet by using less modules? And compare that with VGG5/7/9/16/19 to create a graph of epoch training speed compared to total training time and accuracy?

Using shallow network, the accuracy is downgraded. I think it is related to the receptive field size (it depends on the number of layers and the filter size when use fully convolutional network). I think it may be solved with dilated convolution or progressive approach.

what about auto-detection of JPEG quality? Could that be implemented as well?

I already implemented it, but it is not an open source activity. JPEG noise level can be predicted with classification task, with sets of image patches.

DonaldTsang commented 6 years ago

@nagadomi what about using expert systems for JPEG noise level detection?

2ji3150 commented 6 years ago

Looks like the resnet version is 2.3 times slower than upconv version. But get better quallity than the upcov with TTA (8 times slower). Which means it faster than the upcov with TTA but better quality. So it make sence to replace the normal TTA option. BTW, is there any plain to train an resnet art version model?

DonaldTsang commented 5 years ago

@2ji3150 @nagadomi New idea: NASNet
It looks like NASNet can out-perform most other neural network architecture with LESS computation.

DonaldTsang commented 5 years ago

As a reference: https://github.com/nagadomi/waifu2x/issues/216
(BTW thanks @Yolkis for suggesting that) We should consider training speed and model generation speed.

nagadomi commented 5 years ago

Generally, in super resolution task, pooling layer can not be used. In network architectures for classification task, the input resolution decreases as the number of layers increases, but in super resolution task, it is not.

DonaldTsang commented 5 years ago

@nagadomi is it possible to see this graph (the purple parts) and see if there are alternatives for Waifu2x? mapclean_1 3

nagadomi commented 5 years ago

@DonaldTsang I added a new model last week. benchmark: https://github.com/nagadomi/waifu2x/blob/master/appendix/benchmark.md#art (cunet/art) It is two cascaded U-Net extended by SEBlock(Squeeze and Excitation Networks).

Edit: In the above figure, RefineNet (Stack-U-Net) is a similar model.

yu45020 commented 5 years ago

@nagadomi I come from this issue. Thanks for sharing the new model. Have you tried atrous convolutions on image up-scaling?

There is a paper using atrous conv to segment small objects on satellite images. The model increase the atrous rates and then decrease them. I code a similar model on my manga text segmentation project and find a clear improvement on accuracy. I am rewriting and testing a similar model on image up-scaling. The preliminary result seems acceptable, and I plan to train it thoroughly on a server.

nagadomi commented 5 years ago

@yu45020 I have tried dilated/atrous convolution. It is better than ordinary FCN, but it does not dramatically improve. Currently, I think that Residual U-Net(Concat replaced with Add) has better speed and accuracy than full dilated convolution networks.

I also develop OCR Engine for Manga, it is a closed source product so I can not describe the details, but there is a result on P59~ of this slide (Japanese).

yu45020 commented 5 years ago

@nagadomi Thanks for the advice! I will also check a U-Net like model before training.

Your project seems to complete what I desire. It is very interesting and seems to be comparable to the ABBYSS's engine. My project's in sample prediction achieves similar result, but my goal is to segment all text pixels only. Back to your product. I notice the slices come from a seminar. Do you plan to publish a technical report ?

DonaldTsang commented 5 years ago

@nagadomi @yu45020 any news? If yes, we can write something up in https://github.com/nagadomi/waifu2x/issues/251