kaishengtai / neuralart

An implementation of the paper 'A Neural Algorithm of Artistic Style'.
MIT License
2.41k stars 353 forks source link

Implementation of 'A Neural Algorithm of Artistic Style'

This is a Torch7 implementation of the method described in the paper 'A Neural Algorithm of Artistic Style' by Leon Gatys, Alexander Ecker, and Matthias Bethge (http://arxiv.org/abs/1508.06576).

(Longer animation)

Dependencies

imagine-nn (and any other Torch packages you're missing) can be installed via Luarocks:

luarocks install inn

Usage

First, download the models by running the download script:

bash download_models.sh

This downloads the model weights for the VGG and Inception networks.

Basic usage:

qlua main.lua --style <style.jpg> --content <content.jpg> --style_factor <factor>

where style.jpg is the image that provides the style of the final generated image, and content.jpg is the image that provides the content. style_factor is a constant that controls the degree to which the generated image emphasizes style over content. By default it is set to 2E9.

This generates an image using the VGG-19 network by Karen Simonyan and Andrew Zisserman (http://www.robots.ox.ac.uk/~vgg/research/very_deep/).

Other options:

Out of memory?

The VGG network with the default L-BFGS optimizer gives the best results. However, this setting also requires a lot of GPU memory. If you run into CUDA out-of-memory errors, try running with the Inception architecture or with the SGD optimizer:

qlua main.lua --style <style.jpg> --content <content.jpg> --model inception --optimizer sgd

You can also try reducing the size of the generated image:

qlua main.lua --style <style.jpg> --content <content.jpg> --size 300

If all else fails (or if you don't have a CUDA-compatible GPU), you can optimize on CPU:

qlua main.lua --style <style.jpg> --content <content.jpg> --cpu

Examples

The Eiffel Tower in the style of Edvard Munch's The Scream:

(Longer animation)

Picasso-fied Obama:

(Longer animation)

Implementation Details

When using the Inception network, the outputs of the following layers are used to optimize for style: conv1/7x7_s2, conv2/3x3, inception_3a, inception_3b, inception_4a, inception_4b, inception_4c, inception_4d, inception_4e.

The outputs of the following layers are used to optimize for content: inception_3a, inception_4a.

By default, the optimized image is initialized using the content image; the implementation also works with white noise initialization, as described in the paper.

In order to reduce high-frequency "screen door" noise in the generated image (especially when using the Inception network), total variation regularization is applied (idea from cnn-vis by jcjohnson).

Acknowledgements

The weights for the Inception network used in this implementation were ported to Torch from the publicly-available Caffe distribution.

Thanks to the Bethge Group for providing the weights to the normalized VGG network used here.