This pix2pix model is based on Generative Adversarial Networks. I implemented in order to generate from a segmentation label to an anime illust. I could not get a satisfactory result, but obtained enough for small and rough datasets.
This model generates a anime illust from a segmentation label and a color map of the hair.
SPADE
[1] computes a weight from label at each itself layers, but this architecture has some mapping networks like the StyleGAN[2] and precompute weights. They are called to Constant-Resolution FCN
used AtrousConvolution
[6, 7] instead of FCN
[5] in order not to do down sampling. In addition, each SPADE
layer resize and share a encoded weight by a mapping network. NoiseAdder
of StyleGAN[2]. SelfAttention
[3] instead of Multi-scale discriminator
[1, 10]. Hinge loss
[1] and Zero Centered Gradient Penalty
[9]. Hinge loss
, Feature Matching loss
[1, 10] and Perceptual loss
[1, 10]. This result is obtained by training with 500 pixiv images. This is very small datasets. Additionaly, test datasets is generated by Anime-Semantic-Segmentation-GAN, and training datasets is manualy anotated. However, I cannot publish training datasets for copyright of source images.
The parameters of the upper result is almost same as default value of options.py
.
I prepared pre-trained weights of Generator and Discriminator and added scripts in order to get these weights.
You can get them by executing a following command.
python get_pretrained_weight.py
Totally about 110MB, so it may take a few minutes.
If you want pre-trained model to predict, please do a next python script.
python predict.py
predict.py
creates predicted images from predict_from
directory to predict_to
. Please prepare 256 x 256 png images annotated and 256 x 256 filled hair color. You have to concatenate these.
Please create dataset
directory and prepare dataset. Next, you can set dataset path to option of command.
Python3 train.py --dataset_dir dataset/example
details | |
---|---|
OS | Windows10 Home |
CPU | AMD Ryzen 2600 |
GPU | MSI GTX 960 4GB |
language | Python 3.7.1 |
framework | Chainer 7.0.0, cupy-cuda91 5.3.0 |
[1] Taesung Park, Ming-Yu Liu, Ting-Chun Wang, Jun-Yan Zhu. Semantic Image Synthesis with Spatially-Adaptive Normalization. arXiv preprint arXiv:1903.07291, 2019
[2] Tero Karras, Samuli Laine, Timo Aila. A Style-Based Generator Architecture for Generative Adversarial Networks . arXiv preprint arXiv:1812.04948, 2019
[3] Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena. Self-Attention Generative Adversarial Networks. arXiv preprint arXiv:1805.08318, 2019
[4] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. Generative Adversarial Networks. arXiv preprint arXiv:1406.2661, 2014
[5] Jonathan Long, Evan Shelhamer, Trevor Darrell. Fully Convolutional Networks for Semantic Segmentation. arXiv preprint arXiv:1411.4038, 2015
[6] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. arXiv preprint arXiv:1606.00915, 2017 (v2)
[7] Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv preprint arXiv:1706.05587, 2017 (v3)
[8] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida. Spectral Normalization for Generative Adversarial Networks. arXiv preprint arXiv:1802.05957, 2018
[9] Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, Zehan Wang. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. arXiv preprint arXiv:1609.05158, 2016
[10] Qi Mao, Hsin-Ying Lee, Hung-Yu Tseng, Siwei Ma, Ming-Hsuan Yang. Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis. arXiv preprint arXiv:1903.05628, 2019(v6)
[11] Lars Mescheder, Andreas Geiger, Sebastian Nowozin. Which Training Methods for GANs do actually Converge?. arXiv preprint arXiv:1801.04406, 2018
[12] Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. arXiv preprint arXiv:1711.11585, 2018
[13] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros. Image-to-Image Translation with Conditional Adversarial Networks. arXiv preprint arXiv:1611.07004, 2018