hubert0527 / infinityGAN

InfinityGAN: Towards Infinite-Resolution Image Synthesis
Other
324 stars 25 forks source link

InfinityGAN: Towards Infinite-Resolution Image Synthesis

"To infinity and beyond!"

Chieh Hubert Lin1, Hsin-Ying Lee2, Yen-Chi Cheng3, Sergey Tulyakov2, Ming-Hsuan Yang1,4
1UC Merced, 2Snap Research, 3CMU, 4Google Research

Abstract (click to view) We present a novel framework, InfinityGAN, for arbitrary-sized image generation. The task is associated with several key challenges. First, scaling existing models to an arbitrarily large image size is resource-constrained, in terms of both computation and availability of large-field-of-view training data. InfinityGAN trains and infers in a seamless patch-by-patch manner with low computational resources. Second, large images should be locally and globally consistent, avoid repetitive patterns, and look realistic. To address these, InfinityGAN disentangles global appearances, local structures, and textures. With this formulation, we can generate images with spatial size and level of details not attainable before. Experimental evaluation validates that InfinityGAN generates images with superior realism compared to baselines and features parallelizable inference. Finally, we show several applications unlocked by our approach, such as spatial style fusion, multi-modal outpainting, and image inbetweening. All applications can be operated with arbitrary input and output sizes.

[Project Page] [Paper] [Supplementary]

Teaser

(*These samples are downsampled, please access the raw images via Google Drive)


How To Use


A. Configure Environment

Our repository works on Ubuntu. (One of our machine setups: Ubuntu + Python 3.8.5 + cudatoolkit 10.2)

Setup:

  1. Create conda environement with conda env create --name pt16 --file meta_data/environment.yml. We only tested our pipeline on PyTorch 1.6. Please avoid using PyTorch 1.7 and 1.8 as we observe an awkward degradation in performance.
  2. (Alternative) Directly install with conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch, conda install python-lmdb tqdm matplotlib imageio scikit-image scikit-learn scipy=1.5 and pip install tensorboardx==2.1 pyyaml==5.4.1 easydict.
  3. Designate a directory where you are going to place all your lmdb dataset in env_config.py.

P.S. Theoretically this repository should be workable on Windows if you manage to run StyleGAN2, which requires extra efforts in dealing with cuda codes building with Visual Studio.


B. Prepare Data

Notes: We originally use "Flickr-Landscape (small)" in the V1 paper on Arxiv. We then update the results of all models to "Flickr-Landscape (large)" in the later versions of the paper. We only use the training split for all training and FID evaluation. Nevertheless, we still provide a validation set in "Flickr-Landscape (large)". Notice that the Flickr-Landscape dataset contains images at different sizes without aligning them in the lmdb, so you may add customized training augmentations if desired.

Dataset Used in latest paper # images Minimum image size All images same shape? Size Has holdout set? Link
Flickr-Landscape (small) X 50,000 1024 X 89G X (Google Drive)
Flickr-Landscape (large) V 400,000 1024 X 786G V (Google Drive)
Flickr-Scenery V 54,710 256 V 3.5G V (Will release via In&Out)
Places2-Scenery-Subset V 56,431 256 V 3.2G V (Will release via In&Out)

C. Train Model

Our pipeline requires specifying CUDA_VISIBLE_DEVICES, and automatically switch to dataparallel if two or more GPUs are specified.

Misc flags of train.py:


D. Test Model

Our pipeline requires specifying CUDA_VISIBLE_DEVICES, and automatically switch to dataparallel if two or more GPUs are specified.

Suppose with a model trained with a config ./configs/model/<OuO>.yaml, you want to generate images at HxW resolution. the testing configs are written as follow:

Additional yaml configs for test.py:

Additional args for test.py:


E. Interactive Generation

Set interactive: True in the config, or equivalently use --interactive in the command to test.py.

The interactive generation is supported for the following test_manager classes:

How to use:

Note If you find the image is too large (such as 4096x4096 does not fit into your monitor at all), you can increase the self.fov_rescale to 2 or 4, which downsamples the image before displaying in the canvas (but you are still interacting with the image at its original image).

P.S. To quit the program, you need to close the interface window and kill (ctrl-c) the program in the terminal.


F. Evaluation

Evaluate ScaleInv FID

To test the model with x2 ScaleInv FID:

CUDA_VISIBLE_DEVICES="0,1" python eval_fids.py \
 ./configs/model/<exp_name>.yaml \
 --type=scaleinv \
 --scale=2 \
 --batch-size=2

Other arguments

Evaluate Outpainting

(This script is based on the codes from In&Out)

Please run the inversion first. It will store results (images and inverted variables) at ./logs/<exp_name>/test/<test_name>/, then you can evaluate with the following command:

CUDA_VISIBLE_DEVICES="0" python eval_outpaint_imgdir.py \
 --batch=48 \
 --size=256 \
 --real-dir=./logs/<exp_name>/test/<test_name>/imgs/real_gt/ \
 --fake-dir=./logs/<exp_name>/test/<test_name>/imgs/inv_cmp/

Note that this script only supports single GPU.


G. Pretrained Models and Additional Materials

Pretrained Models (Test Only)

You should structure the ./logs/ folder like this:

logs/ --+--- exp_name_A/ --- ckpt/ --- best_fid.pth.tar
        |
        +--- exp_name_B/ --- ckpt/ --- best_fid.pth.tar
        |
        +--- exp_name_C/ --- ckpt/ --- best_fid.pth.tar
        |
        (...)

You should be able to find corresponding config for each of the released model under ./configs/model/. You can run testing the model with:

CUDA_VISIBLE_DEVICES="0,1" python test.py \
     --model-config=./config/model/<exp_name>.yaml \
     --test-config=./configs/test/infinite_gen_1024x1024.yaml

The test script will auto-detect the checkpoint at ./logs/<exp_name>/ckpt/best_fid.pth.tar

Name Dataset Used in paper Training full image size Training patch size Trained w/ #GPUs Link
InfinityGAN Flickr-Landscape (large) V 197 101 1x TitanX (Google Drive)
InfinityGAN-HR Flickr-Landscape (large) X 389 197 4x V100 (Google Drive)
InfinityGAN-UR Flickr-Landscape (large) X 1024 773 4x V100 (Google Drive)
InfinityGAN-IOF InOut-Flickr-Scenery V 197 101 1x TitanX (Google Drive)
InfinityGAN-IOP InOut-Places2-Scenery-subset V 197 101 1x TitanX (Google Drive)

Inversion results

Inverting a large set of samples requires a large amount of computation. In order to save your time and our earth (just a bit), we release the inversion results here.

The tar file (decompress with tar zxf <filename>.tar) contains following folders:

---+---inv_cmp/             : Compare (left-half) real and (right-half) reconstruction via inversion.
   |
   +---inv_comp_cropped/    : Composed (left-half) real and (right-half) outpainting via inversion.
   |
   +---inv_raw/             : The whole inverted image.
   |
   +---real_gt/             : The real data.

Note: You may notice that there is a cropped in the folder name. InfinityGAN actually inverts images slightly larger than the conditional image, then crop those area away in the end.

Known Issues

  1. The performance on PyTorch 1.4/1.7 and PyTorch 1.6 are different. The root cause is unknown and still a misc event to us, so please use PyTorch 1.6 if possible.
  2. Please do not use dataparallel on two different types of GPUs (e.g., data parallel with GTX1080 + GTX2080), one of the GPUs may generate gray or blank images.
  3. OOM while training with a single GPU. PyTorch can sometimes raise OOM due to unfortunate memory allocations. Here are some tweaks that sometimes resolves the problem (if you indeed have only one GPU):
    • [Reminder] We use "TITAN X (Pascal)" with 12196 MB GPU memery in the single-GPU setup in our paper. We are not certain about the results on other GPUs with less memory.
    • Set torch.backends.cudnn.benchmark to False in train.py. Despite it was designed not to produce OOM, but somehow it sometimes unfortunately makes it.
    • Set calc_fid and calc_fid_ext2 to False. The evaluation allocates additional memory and uses different memory allocation pattern, which can mess up the PyTorch memory allocation schedule. You may directly use the last iteration, as all models converge well.
    • Set ext_mult_list to [2,] instead of [2, 4], which stops logging images generated at 4x testing resolution during training.
    • Reduce training batch_size smaller. However, it may influence the model performance.

License

This repository borrows codes from different sources, please follow the user licenses from each of the source while using this project.

Notice that the code release aims to support the open-source culture in computer vision research community, facilitating the research efficiency and keeping the community open to new comers. The first author strongly discourage research groups that do not match such an open-source culture to use or read any piece of codes in this repository. Please keep the close-door culture bidirectional.


Acknowledgement

The implementation heavily borrows from StyleGAN2-Pytorch, pytorch-fid and PerceptualSimilarity.


Citation

@inproceedings{
    lin2021infinity,
    title={Infinity{GAN}: Towards Infinite-Pixel Image Synthesis},
    author={Lin, Chieh Hubert and Cheng, Yen-Chi and Lee, Hsin-Ying and Tulyakov, Sergey and Yang, Ming-Hsuan},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=ufGMqIM0a4b},
}