Propose generative model that sequentially predicts the pixels in an image along the two spatial dimensions
Architectural novelties include fast two-dimensional RNN layer and an effective use of residual connections
NLL score on natural image generation is SoTA, and sample images are crisp and coherent
Details
Introduction
Unsupervised Generative Image Modeling task
(+) endless amounts of image data available to learn from
(-) estimating the distribution of natural images is extremely challenging
Most previous works use stochastic latent variable models such as VAEs, but often come with an intractable inference step
An effective approach to tractably model a joint distribution is to cast it as a product of conditional distributions (dividing big question into product of small questions) such as NADE, Fully Visible Neural Networks
Contributions
Novel Architecture
PixelRNNs
propose two layer types
Row LSTM layer : conditions on all the previously generated pixels left, treating an image as single row
Diagonal BiLSTM layer : convolution is applied in a novel fashion along the diagonals of the image
Multi-scale PixelRNN : use one unconditional model to predict samples and conditional models to fill in the gaps
PixelCNN : CNN based sequence model using masked convolution
Model
Generating an Image Pixel by Pixel
each pixel is in turn jointly determined by three values, one for each color channels : RGB
Pixels as Discrete Variables
pixels are treated as discrete distribution and trained using softmax, compared to previous approaches using it as a continuous distribution
(+) arbitrarily multimodal w/o prior on the shape, produce better performance
Pixel RNN
Convolution + RNN + Masking + Residual all combined in LSTM cell
12-layer model
Results
MNIST (SoTA)
CIFAR-10 (SoTA)
ImageNet : no benchmarks to compare
Personal Thoughts
1st author is same as Parallel WaveNet
Novel, creative approach back in 2016
Impressed at the diverse modeling architectures one tries (Row, Multi-scale, Diag / RNN, CNN)
engineering skill is strong in Google
referencing to previous papers in each details is surprising
every detailed choices being made is supported with or compared against reference papers
Abstract
Details
Introduction
Contributions
Model
Pixel RNN
Results
Personal Thoughts
Link : https://arxiv.org/pdf/1601.06759.pdf Authors : Oord et al. 2016