"To infinity and beyond!"
Chieh Hubert Lin1, Hsin-Ying Lee2, Yen-Chi Cheng3, Sergey Tulyakov2, Ming-Hsuan Yang1,4
1UC Merced, 2Snap Research, 3CMU, 4Google Research
[Project Page] [Paper] [Supplementary]
(*These samples are downsampled, please access the raw images via Google Drive)
Our repository works on Ubuntu. (One of our machine setups: Ubuntu + Python 3.8.5 + cudatoolkit 10.2
)
Setup:
conda env create --name pt16 --file meta_data/environment.yml
. We only tested our pipeline on PyTorch 1.6. Please avoid using PyTorch 1.7 and 1.8 as we observe an awkward degradation in performance.conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch
, conda install python-lmdb tqdm matplotlib imageio scikit-image scikit-learn scipy=1.5
and pip install tensorboardx==2.1 pyyaml==5.4.1 easydict
.env_config.py
.P.S. Theoretically this repository should be workable on Windows if you manage to run StyleGAN2, which requires extra efforts in dealing with cuda codes building with Visual Studio.
Notes: We originally use "Flickr-Landscape (small)" in the V1 paper on Arxiv. We then update the results of all models to "Flickr-Landscape (large)" in the later versions of the paper. We only use the training split for all training and FID evaluation. Nevertheless, we still provide a validation set in "Flickr-Landscape (large)". Notice that the Flickr-Landscape dataset contains images at different sizes without aligning them in the lmdb, so you may add customized training augmentations if desired.
Dataset | Used in latest paper | # images | Minimum image size | All images same shape? | Size | Has holdout set? | Link |
---|---|---|---|---|---|---|---|
Flickr-Landscape (small) | X | 50,000 | 1024 | X | 89G | X | (Google Drive) |
Flickr-Landscape (large) | V | 400,000 | 1024 | X | 786G | V | (Google Drive) |
Flickr-Scenery | V | 54,710 | 256 | V | 3.5G | V | (Will release via In&Out) |
Places2-Scenery-Subset | V | 56,431 | 256 | V | 3.2G | V | (Will release via In&Out) |
configs/dataset/flickr-landscape-small.yaml
.python prepare_data.py ./configs/dataset/flickr-landscape-small.yaml --train_only
LMDB_ROOTS
you specified in env_config.py
.data_params.dataset
flag in your training config when you train the model.Our pipeline requires specifying CUDA_VISIBLE_DEVICES
, and automatically switch to dataparallel if two or more GPUs are specified.
CUDA_VISIBLE_DEVICES="0" python train.py ./configs/model/InfinityGAN.yaml
CUDA_VISIBLE_DEVICES="0" python train.py ./configs/model/StyleGAN2_NCI.yaml
CUDA_VISIBLE_DEVICES="0" python train.py ./configs/model/StyleGAN2_NCI_FCG.yaml
Misc flags of train.py
:
--debug
: With this flag, the training pipeline will run one iteration training, execute all logging and evaluation for one iteration, then quit without writing any thing to your logs. Sometimes you may just want to test your environment or config without writing any thing.--archive-mode
: Our pipeline automatically backups your codes at ./logs/<exp_name>/codes/
. You may run the model training within that folder by using this flag.Our pipeline requires specifying CUDA_VISIBLE_DEVICES
, and automatically switch to dataparallel if two or more GPUs are specified.
Suppose with a model trained with a config ./configs/model/<OuO>.yaml
, you want to generate images at HxW
resolution. the testing configs are written as follow:
Naive Generation
Directly synthesize the whole image. O(H*W)
memory allocation.
CUDA_VISIBLE_DEVICES="0,1" python test.py \
--model-config=./configs/model/<OuO>.yaml \
--test-config=./configs/test/direct_gen_HxW.yaml
Infinite Generation
Sequentially generate patches. O(1)
memory allocation.
CUDA_VISIBLE_DEVICES="0,1" python test.py \
--model-config=./configs/model/<OuO>.yaml \
--test-config=./configs/test/infinite_gen_HxW.yaml
Spatial Fusion Generation
Spatially fuses multiple styles. Follows the "infinite generation" design.
CUDA_VISIBLE_DEVICES="0,1" python test.py \
--model-config=./configs/model/<OuO>.yaml \
--test-config=./configs/test/fused_gen_HxW.yaml
Inversion
Please rememeber to update override_dataset_data_size
and override_dataset_full_size
if the inversion real image resolution is different from the training resolution.
CUDA_VISIBLE_DEVICES="0" python test.py \
--model-config="./configs/model/<OuO>.yaml" \
--test-config="./test_configs/inversion_<???>.yaml"
Outpainting Invert the latent variables, and outpaint the image.
# Run inversion first
CUDA_VISIBLE_DEVICES="0" python test.py \
--model-config="./configs/model/<OuO>.yaml" \
--test-config="./test_configs/inversion_256x256_L2R.yaml"
# Then outpaint
CUDA_VISIBLE_DEVICES="0" python test.py \
--model-config="./configs/model/<OuO>.yaml" \
--test-config="./test_configs/outpaint_with_fused_gen_256x256.yaml" \
--inv-records="./logs/<OuO>/test/outpaint_with_fused_gen_256x256/stats/<id>.pkl" \
--inv-placements=0.5,0.5
Inbetweening Invert the latent variables, and outpaint the image.
# Run inversion first
CUDA_VISIBLE_DEVICES="0" python test.py \
--model-config="./configs/model/<OuO>.yaml" \
--test-config="./test_configs/inversion_IOF246_256x1280L_256x128.yaml"
CUDA_VISIBLE_DEVICES="0" python test.py \
--model-config="./configs/model/<OuO>.yaml" \
--test-config="./test_configs/inversion_IOF246_256x1280R_256x128.yaml"
# Then outpaint (the `inv-records` and `inv-placements` are ordered lists separated with `:`)
CUDA_VISIBLE_DEVICES="0" python test.py \
--model-config="./configs/model/<OuO>.yaml" \
--test-config="./test_configs/inbetween_with_fused_gen_256x1280.yaml" \
--inv-records="./logs/<OuO>/test/inversion_IOF246_256x1280L_256x128/stats/<id>.pkl:./logs/<OuO>/test/inversion_IOF246_256x1280R_256x128/stats/<id>.pkl" \
--inv-placements=0.5,0.05:0.5,0.95
P.S. As (i) the inversion area of the real image, (ii) the inversion area of the generated image, and (iii) the position of the inverted latents while outpainting can be different (as well as some further technical difficulties). Unfortunately, you need to invert the latent variables each time you change either the inversion area size, the position of the inversion area, or the outpainting target resolution.
test.py
:lowres_height
: High-resolution images are hard to download from remote, we additionally save a low-resolution version of the images by aspect-ratio downsampling the images to the specified height.interactive
: See below.parallel_batch_size
: The "parallel batching" application mentioned in the paper. test_manager.infinte_generation
and test_manager.fused_generation
. test_manager.base_test_manager.py:maybe_parallel_inference()
. batch_size
can be simultaneously supported, we make them mutually exclusive as the mixing use of these two batching strategies is not meaningful.test.py
:--speed-benchmark
: Collects GPU execution time (includes dataparallel scatter and collection time). Ignores the first-ten iterations.--calc-flops
: Get the total FLOPs used in synthesizing a full image.Set interactive: True
in the config, or equivalently use --interactive
in the command to test.py
.
The interactive generation is supported for the following test_manager
classes:
test_manager.infinte_generation
test_manager.fused_generation
How to use:
Note If you find the image is too large (such as 4096x4096 does not fit into your monitor at all), you can increase the self.fov_rescale
to 2 or 4, which downsamples the image before displaying in the canvas (but you are still interacting with the image at its original image).
P.S. To quit the program, you need to close the interface window and kill (ctrl-c) the program in the terminal.
To test the model with x2 ScaleInv FID:
CUDA_VISIBLE_DEVICES="0,1" python eval_fids.py \
./configs/model/<exp_name>.yaml \
--type=scaleinv \
--scale=2 \
--batch-size=2
Other arguments
--ckpt
: By default, we test the checkpoint at ./logs/<exp_name>/ckpt/best_fid.pth.tar
. You may override the path with this argument if you want to test other checkpoints.--img-folder
: You may use this in case you want to test with a folder with images.--type
: We also implemented another FID schema spatial
, which partitions the image into 16(=4x4) patches, extract Inception features for each patch, and concatenate them into a plain vector. This is much slower and consumes massive CPU memory. And the trend (FID v.s. scale) is similar to ScaleInv FID.--seq-inference
: For InfinityGAN, due to the additional structure synthesizer, the model can OOM at higher resolution if generating the image at one-shot. You may use this flag to enable sequential inference (i.e., uses test_managers.infinite_generation.py
). But this will slow down the inference due to some internal redundant computations.(This script is based on the codes from In&Out)
Please run the inversion first. It will store results (images and inverted variables) at ./logs/<exp_name>/test/<test_name>/
, then you can evaluate with the following command:
CUDA_VISIBLE_DEVICES="0" python eval_outpaint_imgdir.py \
--batch=48 \
--size=256 \
--real-dir=./logs/<exp_name>/test/<test_name>/imgs/real_gt/ \
--fake-dir=./logs/<exp_name>/test/<test_name>/imgs/inv_cmp/
Note that this script only supports single GPU.
You should structure the ./logs/
folder like this:
logs/ --+--- exp_name_A/ --- ckpt/ --- best_fid.pth.tar
|
+--- exp_name_B/ --- ckpt/ --- best_fid.pth.tar
|
+--- exp_name_C/ --- ckpt/ --- best_fid.pth.tar
|
(...)
You should be able to find corresponding config for each of the released model under ./configs/model/
. You can run testing the model with:
CUDA_VISIBLE_DEVICES="0,1" python test.py \
--model-config=./config/model/<exp_name>.yaml \
--test-config=./configs/test/infinite_gen_1024x1024.yaml
The test script will auto-detect the checkpoint at ./logs/<exp_name>/ckpt/best_fid.pth.tar
Name | Dataset | Used in paper | Training full image size | Training patch size | Trained w/ #GPUs | Link |
---|---|---|---|---|---|---|
InfinityGAN | Flickr-Landscape (large) | V | 197 | 101 | 1x TitanX | (Google Drive) |
InfinityGAN-HR | Flickr-Landscape (large) | X | 389 | 197 | 4x V100 | (Google Drive) |
InfinityGAN-UR | Flickr-Landscape (large) | X | 1024 | 773 | 4x V100 | (Google Drive) |
InfinityGAN-IOF | InOut-Flickr-Scenery | V | 197 | 101 | 1x TitanX | (Google Drive) |
InfinityGAN-IOP | InOut-Places2-Scenery-subset | V | 197 | 101 | 1x TitanX | (Google Drive) |
Inverting a large set of samples requires a large amount of computation. In order to save your time and our earth (just a bit), we release the inversion results here.
The tar file (decompress with tar zxf <filename>.tar
) contains following folders:
---+---inv_cmp/ : Compare (left-half) real and (right-half) reconstruction via inversion.
|
+---inv_comp_cropped/ : Composed (left-half) real and (right-half) outpainting via inversion.
|
+---inv_raw/ : The whole inverted image.
|
+---real_gt/ : The real data.
Note: You may notice that there is a cropped in the folder name. InfinityGAN actually inverts images slightly larger than the conditional image, then crop those area away in the end.
torch.backends.cudnn.benchmark
to False
in train.py
. Despite it was designed not to produce OOM, but somehow it sometimes unfortunately makes it.calc_fid
and calc_fid_ext2
to False
. The evaluation allocates additional memory and uses different memory allocation pattern, which can mess up the PyTorch memory allocation schedule. You may directly use the last iteration, as all models converge well.ext_mult_list
to [2,]
instead of [2, 4]
, which stops logging images generated at 4x testing resolution during training.batch_size
smaller. However, it may influence the model performance.This repository borrows codes from different sources, please follow the user licenses from each of the source while using this project.
Notice that the code release aims to support the open-source culture in computer vision research community, facilitating the research efficiency and keeping the community open to new comers. The first author strongly discourage research groups that do not match such an open-source culture to use or read any piece of codes in this repository. Please keep the close-door culture bidirectional.
The implementation heavily borrows from StyleGAN2-Pytorch, pytorch-fid and PerceptualSimilarity.
@inproceedings{
lin2021infinity,
title={Infinity{GAN}: Towards Infinite-Pixel Image Synthesis},
author={Lin, Chieh Hubert and Cheng, Yen-Chi and Lee, Hsin-Ying and Tulyakov, Sergey and Yang, Ming-Hsuan},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=ufGMqIM0a4b},
}