Pixel Recurrent Neural Networks

Abstract

Propose generative model that sequentially predicts the pixels in an image along the two spatial dimensions
Architectural novelties include fast two-dimensional RNN layer and an effective use of residual connections
NLL score on natural image generation is SoTA, and sample images are crisp and coherent

Details

Introduction
- Unsupervised Generative Image Modeling task
- (+) endless amounts of image data available to learn from
- (-) estimating the distribution of natural images is extremely challenging
- Most previous works use stochastic latent variable models such as VAEs, but often come with an intractable inference step
- An effective approach to tractably model a joint distribution is to cast it as a product of conditional distributions (dividing big question into product of small questions) such as NADE, Fully Visible Neural Networks
Contributions
- Novel Architecture
- PixelRNNs
  - propose two layer types
  - Row LSTM layer : conditions on all the previously generated pixels left, treating an image as single row
  - Diagonal BiLSTM layer : convolution is applied in a novel fashion along the diagonals of the image
- Multi-scale PixelRNN : use one unconditional model to predict samples and conditional models to fill in the gaps
- PixelCNN : CNN based sequence model using masked convolution
Model
- Generating an Image Pixel by Pixel
- each pixel is in turn jointly determined by three values, one for each color channels : RGB
- Pixels as Discrete Variables
- pixels are treated as discrete distribution and trained using softmax, compared to previous approaches using it as a continuous distribution
- (+) arbitrarily multimodal w/o prior on the shape, produce better performance
Pixel RNN
- Convolution + RNN + Masking + Residual all combined in LSTM cell
- 12-layer model
Results
- MNIST (SoTA)
- CIFAR-10 (SoTA)
- ImageNet : no benchmarks to compare

Personal Thoughts

1st author is same as Parallel WaveNet
Novel, creative approach back in 2016
Impressed at the diverse modeling architectures one tries (Row, Multi-scale, Diag / RNN, CNN)
- engineering skill is strong in Google
referencing to previous papers in each details is surprising
- every detailed choices being made is supported with or compared against reference papers
PixelRNN is similar to Char-level NMT

Link : https://arxiv.org/pdf/1601.06759.pdf Authors : Oord et al. 2016

kweonwooj / papers

Pixel Recurrent Neural Networks #77

Abstract

Details

Personal Thoughts