NOT an issue, but a question :)

thusinh1969 commented 2 years ago

Hi,

I read the paper and it very sounds. I have been trying NVIDIA StyleGanV2-ADA for weeks without success. It simply did not converge and generated images were full of unwanted artifacts. My dataset is furniture (living room, bed room etc.) and each dataset has over 60 - 150k images. Yes some has only 10k images but we have not tried them.

Even with 100k of images, StyleGANV2-ADA original version failed us for whatever reasons. We tried R1 all ranges, drop/add augmentation bgcfc etc. Not working. We ran up to 30,000 kimg (35 million images through network), still lots of weird streaks in the images.

We are about to try this repo, and can you advise something along the way:

1) Is there something special about our dataset that may need special attention parameters? Which parameters are to be taken care of? These images are NOT faces / dog / cat where they cropped / highly focus, our dataset as you may know they are widely different in scenes, color and convergence means minimal artifacts around :)

2) How long do you think it will take for convergence if at all ? We use 4 GPUs GTX 2080ti in our lab for now hence run batch size of 32 for 256x256 image size.

Would love to keep in touch.

Thanks in advance. Steve (nguyen@hatto.com)

zsyzzsoft commented 2 years ago

Your dataset seems large enough so the problem does not sound like a discriminator over fitting issue. Could you share some generated and real images?

thusinh1969 commented 2 years ago

Here we go at 24,000 kimg (24 million images went through network).

Total images 47,000 (256x256) with 19 conditional classes
4xV100 batch 64.
RAM/CPU much more than enough
Ubuntu 18/Torch 1.8 CUDA 11.1
Use paper256, R1 (gamma)=10, mit-han-lab augmentation all 3

REAL:

reals_crop

FAKE

fake_crop

----------- Run command line --------------- python train.py --outdir="../../results/" --gpus=4 --batch=64 --data="../../images_256_StyleGANV2/" --cond=True --mirror=True --cfg=paper256 --gamma=10 --kimg=50000 --DiffAugment="color,translation,cutout" --resume="../../results/CHECKPOINT/network-snapshot-023587.pkl"

---------- train options json file ------------ { "num_gpus": 4, "image_snapshot_ticks": 50, "network_snapshot_ticks": 50, "metrics": [ "fid50k_full" ], "random_seed": 0, "training_set_kwargs": { "class_name": "training.dataset.ImageFolderDataset", "path": "../../images_256_StyleGANV2/", "use_labels": true, "max_size": 47049, "xflip": true, "resolution": 256 }, "data_loader_kwargs": { "pin_memory": true, "num_workers": 3, "prefetch_factor": 2 }, "G_kwargs": { "class_name": "training.networks.Generator", "z_dim": 512, "w_dim": 512, "mapping_kwargs": { "num_layers": 8 }, "synthesis_kwargs": { "channel_base": 16384, "channel_max": 512, "num_fp16_res": 4, "conv_clamp": 256 } }, "D_kwargs": { "class_name": "training.networks.Discriminator", "block_kwargs": {}, "mapping_kwargs": {}, "epilogue_kwargs": { "mbstd_group_size": 8 }, "channel_base": 16384, "channel_max": 512, "num_fp16_res": 4, "conv_clamp": 256 }, "G_opt_kwargs": { "class_name": "torch.optim.Adam", "lr": 0.0025, "betas": [ 0, 0.99 ], "eps": 1e-08 }, "D_opt_kwargs": { "class_name": "torch.optim.Adam", "lr": 0.0025, "betas": [ 0, 0.99 ], "eps": 1e-08 }, "loss_kwargs": { "class_name": "training.loss.StyleGAN2Loss", "r1_gamma": 10.0, "diffaugment": "color,translation,cutout" }, "total_kimg": 50000, "batch_size": 64, "batch_gpu": 16, "ema_kimg": 20, "ema_rampup": null, "resume_pkl": "../../results/CHECKPOINT/network-snapshot-023587.pkl", "ada_kimg": 100, "run_dir": "../../results/00000--cond-mirror-paper256-gamma10-kimg50000-batch64-color-translation-cutout-resumecustom" }

zsyzzsoft commented 2 years ago

I think the generated images do not look very bad :) So maybe it's just because the dataset is quite challenging, and state-of-the-art GAN models are still limited in various aspects like model capacity and training methodology to model a very complex distribution even though the dataset is large enough.

thusinh1969 commented 2 years ago

I think the generated images do not look very bad :) So maybe it's just because the dataset is quite challenging, and state-of-the-art GAN models are still limited in various aspects like model capacity and training methodology to model a very complex distribution even though the dataset is large enough.

It is BAD, man. It is NOT usable. Steve

thusinh1969 commented 2 years ago

Getting worsen results...

fakes000000 (3)

I will try https://github.com/l4rz/scaling-up-stylegan2

Reduce lrate, gamma=6, eliminate style mixing... Let see.

Steve

zsyzzsoft commented 2 years ago

So I think this is not an issue of discriminator overfitting that can be resolved by DiffAugment but is limited by the network capacity or current methodology.

Kitty-sunray commented 2 years ago

A clear example of how people overestimate AI capabilities. They trust advertisement/teaser videos and cherry-picked examples in papers too much :-D Man, these results are incredible for a dataset of just 0.03M images with super complex relations and perspective. Go build your own GAN if you want more :-D

mit-han-lab / data-efficient-gans

NOT an issue, but a question :) #79