mit-han-lab / data-efficient-gans

[NeurIPS 2020] Differentiable Augmentation for Data-Efficient GAN Training
https://arxiv.org/abs/2006.10738
BSD 2-Clause "Simplified" License
1.28k stars 174 forks source link

Better FID with smaller batch size #4

Closed utkarshojha closed 4 years ago

utkarshojha commented 4 years ago

Hi, thanks for the quick release of the code. The following is not an issue, but an observation I made while playing around with the code. If we keep everything the same, and simply reduce the batch size to 16 (default is 32), the FID for the Obama dataset improves from 54.39 (reported in the paper) to 47.0032. Was there a trend with variation in batch size that the authors observed in the scenario of few-shot generation?

zsyzzsoft commented 4 years ago

This is an interesting observation. Actually we did not do much hyperparameter tuning --- most hyperparameters, including batch size, are kept unchanged from the default StyleGAN2 setting. What you observed is a clear improvement; the inter-run variance is relatively low.

andersonfaaria commented 4 years ago

interesting note, do you have any guess on why this happens? What I noted is exactly the opposite: higher the batch size better the results (as long as lr grows K factor along batch size increase).

By lowering the batch size you kept the same lr for both discriminator and generator?

ghost commented 4 years ago

@zsyzzsoft was this verified as an improvement, seems it is was reproducible for others?

zsyzzsoft commented 4 years ago

@zsyzzsoft was this verified as an improvement, seems it is was reproducible for others?

Yes, I verified that the better performance obtained using a smaller batch size seems reproducible and it also works on the other 100-shot datasets.

ghost commented 4 years ago

@zsyzzsoft could this idea work for transformers, like image-gpt?

zsyzzsoft commented 4 years ago

@zsyzzsoft could this idea work for transformers, like image-gpt?

Emm... I'm not sure...

zsyzzsoft commented 4 years ago

In arXiv v2, the batch size for few-shot generation is set to 16.