singnet / rfai-proposal

MIT License
2 stars 0 forks source link

Image Inpainting #2

Open raamb opened 4 years ago

raamb commented 4 years ago

Author Robin Lehmann

Description Fill gaps in images with reasonable content to create a visually pleasing result As soon as images or parts of images get manipulated with AI algorithms by detecting objects in them and cropping or manipulating the images, gaps in the images will be created. To make it possible to recover the background or remaining parts of the image to still keep a coherent scene, image inpainting is required. The problem is quite difficult as the visual domain can contain a large variety of content and the gaps can be of different sizes. Therefore a specialisation might be necessary. But it is also possible to set the ability of the network in regards to a dataset that is used as this will be the limiting factor.

Examples: In the first three examples (figures 1, 6, and 8 from this article) for an existing solution the authors of chose to use partial convolutions to solve the problem. This led to a big improvement in the recreation of content for irregular masks and is the next technological step after the results from [1]. The original code for [2] can be found in [7] but there is another implementation that uses different datasets in [8] which adds additional information for future improvements. So a logical step for a next solution might be to use all the datasets to get even better results.

The next example (from the README of [9]) is taken from a reimplementation of [2] in Keras. It opts to use an OpenCV mask creation system instead of the occlusion/dis-occlusion between two consecutive frames in videos that was used in [2].

References Papers: One of the first papers on Image Inpainting using Deep Learning to fill gaps. It is focused on regular shapes and context awareness. [1]http://openaccess.thecvf.com/content_cvpr_2017/papers/Yang_High-Resolution_Image_Inpainting_CVPR_2017_paper.pdf In the next technological evolutionary step the authors of this paper fixed irregular shaped masks via inpainting. [2]https://arxiv.org/pdf/1804.07723.pdf

Demo: In the demo the authors of [2] make it possible for the audience to try their technique. [3]https://www.nvidia.com/research/inpainting/

Video: The authors of [2] created a video about their results. [4]https://www.youtube.com/watch?v=gg0F5JjKmhA

GitHub: Code of the paper “Globally and Locally Consistent Image Completion“ which is cited in [1] and [2]: [5]https://github.com/satoshiiizuka/siggraph2017_inpainting An open source framework for generative image inpainting task, with the support of Contextual Attention (CVPR 2018) and Gated Convolution (ICCV 2019 Oral): [6]https://github.com/JiahuiYu/generative_inpainting The reference implementation for [2]: [7]https://github.com/NVIDIA/partialconv Reimplementation of [2] using different datasets: [8]https://github.com/SunnerLi/P-Conv Reimplementation of [2] using different mask creation: [9]https://github.com/MathiasGruber/PConv-Keras

Datasets ImageNet dataset (http://www.image-net.org/) is a dataset that contains annotated images of objects where the annotations belong to several categories (100000 so called synsets, see WordNet (https://wordnet.princeton.edu/) for more information about synsets). This makes it possible to train machine learning algorithms on these annotated images according to concepts that connect different images in more than just one annotation. But ImageNet only provides thumbnails and URLs of images, in a way similar to what image search engines do. In other words, ImageNet compiles an accurate list of web images for each synset of WordNet. This dataset is useful to train the completion of objects.

Places2 dataset (http://places2.csail.mit.edu/) is a dataset that contains annotated images of real world scenes. In total, Places contains more than 10 million images comprising 400+ unique scene categories. The dataset features 5000 to 30,000 training images per class, consistent with real-world frequencies of occurrence. This dataset is useful to train the completion of scenes.

CelebA-HQ dataset (http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) is a dataset that contains annotated images of celebrities. There are 40 annotations that describe the face properties like glasses, wavy hair, smiling etc. This dataset is useful to train the completion of faces.

To create gaps in the images the method described in [2] will be used to create irregular gaps.

Results The winning solution of this request exceeds the current metrics provided by [2] and still creates visually pleasing results. The tests will be done with a hidden dataset comprised of masks as well as images and inspected by backers and the SingularityNET foundation. It is clear that a solution will not necessarily be able to work on all data, there is just a very high variance of the information contained in images - e.g. if an image of a circuit board has an image gap the algorithm will not be able to complete it correctly. But the solution could state which datasets were used to train it and following from that which kind of images it can complete. A big plus is given for new technological advances by improving the techniques given in [2]. But if the solution can improve the result by incorporating more data and therefore increase the metrics in a service on the platform that is also a desired result.

Metrics L1 norm error aka Least Absolute Deviations error as defined e.g. here https://en.wikipedia.org/wiki/Least_absolute_deviations PSNR aka peak signal-to-noise ratio as defined e.g. here https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio iScore aka Inception Score as introduced in the paper Improved Techniques for Training GANs (https://arxiv.org/abs/1606.03498) SSIM aka Structural SIMilarity as introduced in the paper Image Quality Assessment: From Error Visibility to Structural Similarity (https://www.cns.nyu.edu/pub/lcv/wang03-preprint.pdf)

Minimal Requirements The minimal requirement is an improvement of the metrics in regards to the gap sizes given in the evaluations table of [2]. Additionally the resulting images have to be visible pleasing. The evaluation will be oriented at the evaluation of [2].

Non-functional Requirements This can not just be judged by mathematical methods but also needs to be evaluated by humans. In [2] the image quality of the solutions were rated by using mechanical turk and timed/untimed inspections by humans as well as using an A/B testing scheme. This request will be rated in a similar fashion.

Reward Amount 80000 AGI

Expiration Date 20 August 2020