Closed utkarshojha closed 4 years ago
This is an interesting observation. Actually we did not do much hyperparameter tuning --- most hyperparameters, including batch size, are kept unchanged from the default StyleGAN2 setting. What you observed is a clear improvement; the inter-run variance is relatively low.
interesting note, do you have any guess on why this happens? What I noted is exactly the opposite: higher the batch size better the results (as long as lr grows K factor along batch size increase).
By lowering the batch size you kept the same lr for both discriminator and generator?
@zsyzzsoft was this verified as an improvement, seems it is was reproducible for others?
@zsyzzsoft was this verified as an improvement, seems it is was reproducible for others?
Yes, I verified that the better performance obtained using a smaller batch size seems reproducible and it also works on the other 100-shot datasets.
@zsyzzsoft could this idea work for transformers, like image-gpt?
@zsyzzsoft could this idea work for transformers, like image-gpt?
Emm... I'm not sure...
In arXiv v2, the batch size for few-shot generation is set to 16.
Hi, thanks for the quick release of the code. The following is not an issue, but an observation I made while playing around with the code. If we keep everything the same, and simply reduce the batch size to 16 (default is 32), the FID for the Obama dataset improves from 54.39 (reported in the paper) to 47.0032. Was there a trend with variation in batch size that the authors observed in the scenario of few-shot generation?