mit-han-lab / data-efficient-gans

[NeurIPS 2020] Differentiable Augmentation for Data-Efficient GAN Training
https://arxiv.org/abs/2006.10738
BSD 2-Clause "Simplified" License
1.27k stars 174 forks source link

NOT an issue, but a question :) #79

Open thusinh1969 opened 2 years ago

thusinh1969 commented 2 years ago

Hi,

I read the paper and it very sounds. I have been trying NVIDIA StyleGanV2-ADA for weeks without success. It simply did not converge and generated images were full of unwanted artifacts. My dataset is furniture (living room, bed room etc.) and each dataset has over 60 - 150k images. Yes some has only 10k images but we have not tried them.

Even with 100k of images, StyleGANV2-ADA original version failed us for whatever reasons. We tried R1 all ranges, drop/add augmentation bgcfc etc. Not working. We ran up to 30,000 kimg (35 million images through network), still lots of weird streaks in the images.

We are about to try this repo, and can you advise something along the way:

1) Is there something special about our dataset that may need special attention parameters? Which parameters are to be taken care of? These images are NOT faces / dog / cat where they cropped / highly focus, our dataset as you may know they are widely different in scenes, color and convergence means minimal artifacts around :)

2) How long do you think it will take for convergence if at all ? We use 4 GPUs GTX 2080ti in our lab for now hence run batch size of 32 for 256x256 image size.

Would love to keep in touch.

Thanks in advance. Steve (nguyen@hatto.com)

zsyzzsoft commented 2 years ago

Your dataset seems large enough so the problem does not sound like a discriminator over fitting issue. Could you share some generated and real images?

thusinh1969 commented 2 years ago

Here we go at 24,000 kimg (24 million images went through network).

REAL:

reals_crop

FAKE

fake_crop

----------- Run command line --------------- python train.py --outdir="../../results/" --gpus=4 --batch=64 --data="../../images_256_StyleGANV2/" --cond=True --mirror=True --cfg=paper256 --gamma=10 --kimg=50000 --DiffAugment="color,translation,cutout" --resume="../../results/CHECKPOINT/network-snapshot-023587.pkl"

---------- train options json file ------------ { "num_gpus": 4, "image_snapshot_ticks": 50, "network_snapshot_ticks": 50, "metrics": [ "fid50k_full" ], "random_seed": 0, "training_set_kwargs": { "class_name": "training.dataset.ImageFolderDataset", "path": "../../images_256_StyleGANV2/", "use_labels": true, "max_size": 47049, "xflip": true, "resolution": 256 }, "data_loader_kwargs": { "pin_memory": true, "num_workers": 3, "prefetch_factor": 2 }, "G_kwargs": { "class_name": "training.networks.Generator", "z_dim": 512, "w_dim": 512, "mapping_kwargs": { "num_layers": 8 }, "synthesis_kwargs": { "channel_base": 16384, "channel_max": 512, "num_fp16_res": 4, "conv_clamp": 256 } }, "D_kwargs": { "class_name": "training.networks.Discriminator", "block_kwargs": {}, "mapping_kwargs": {}, "epilogue_kwargs": { "mbstd_group_size": 8 }, "channel_base": 16384, "channel_max": 512, "num_fp16_res": 4, "conv_clamp": 256 }, "G_opt_kwargs": { "class_name": "torch.optim.Adam", "lr": 0.0025, "betas": [ 0, 0.99 ], "eps": 1e-08 }, "D_opt_kwargs": { "class_name": "torch.optim.Adam", "lr": 0.0025, "betas": [ 0, 0.99 ], "eps": 1e-08 }, "loss_kwargs": { "class_name": "training.loss.StyleGAN2Loss", "r1_gamma": 10.0, "diffaugment": "color,translation,cutout" }, "total_kimg": 50000, "batch_size": 64, "batch_gpu": 16, "ema_kimg": 20, "ema_rampup": null, "resume_pkl": "../../results/CHECKPOINT/network-snapshot-023587.pkl", "ada_kimg": 100, "run_dir": "../../results/00000--cond-mirror-paper256-gamma10-kimg50000-batch64-color-translation-cutout-resumecustom" }

zsyzzsoft commented 2 years ago

I think the generated images do not look very bad :) So maybe it's just because the dataset is quite challenging, and state-of-the-art GAN models are still limited in various aspects like model capacity and training methodology to model a very complex distribution even though the dataset is large enough.

thusinh1969 commented 2 years ago

I think the generated images do not look very bad :) So maybe it's just because the dataset is quite challenging, and state-of-the-art GAN models are still limited in various aspects like model capacity and training methodology to model a very complex distribution even though the dataset is large enough.

It is BAD, man. It is NOT usable. Steve

thusinh1969 commented 2 years ago

Getting worsen results...

fakes000000 (3)

I will try https://github.com/l4rz/scaling-up-stylegan2

Reduce lrate, gamma=6, eliminate style mixing... Let see.

Steve

zsyzzsoft commented 2 years ago

So I think this is not an issue of discriminator overfitting that can be resolved by DiffAugment but is limited by the network capacity or current methodology.

Kitty-sunray commented 2 years ago

A clear example of how people overestimate AI capabilities. They trust advertisement/teaser videos and cherry-picked examples in papers too much :-D Man, these results are incredible for a dataset of just 0.03M images with super complex relations and perspective. Go build your own GAN if you want more :-D