soumith / ganhacks

starter from "How to Train a GAN?" at NIPS2016
11.43k stars 1.66k forks source link

Why "Construct different mini-batches for real and fake, i.e. each mini-batch needs to contain only all real images or all generated images"? #52

Open mrgloom opened 5 years ago

mrgloom commented 5 years ago

What is motivation behind "Construct different mini-batches for real and fake, i.e. each mini-batch needs to contain only all real images or all generated images" can you eleborate on that?

smikhai1 commented 5 years ago

@mrgloom tldr: If we don't construct different mini-batches for real and fake samples, batch normalization will not work as it supposed and get any profit.

The purpose of batch normalization is to reduce internal covariance shift in activation maps by making all of the activations be distributed equally (with zero mean and std equal to 1). In this case, there is no necessity for a neural network to adapt to changes in distributions of activations, that occur due to the changes in weights during the training process. As a result, such normalization simplifies learning significantly.

At the very beginning of GAN's training real and fake samples in a mini-batch have very very different distributions, thus if we try to normalize it, we won't end up with well-centered data. Moreover, the distribution of such normalized data will be changing significantly during the training (because the generator provides better and better results), and the discriminator will have to adapt to these changes.

ECEMACHINE commented 4 years ago

@mrgloom tldr: If we don't construct different mini-batches for real and fake samples, batch normalization will not work as it supposed and get any profit.

The purpose of batch normalization is to reduce internal covariance shift in activation maps by making all of the activations be distributed equally (with zero mean and std equal to 1). In this case, there is no necessity for a neural network to adapt to changes in distributions of activations, that occur due to the changes in weights during the training process. As a result, such normalization simplifies learning significantly.

At the very beginning of GAN's training real and fake samples in a mini-batch have very very different distributions, thus if we try to normalize it, we won't end up with well-centered data. Moreover, the distribution of such normalized data will be changing significantly during the training (because the generator provides better and better results), and the discriminator will have to adapt to these changes.

Thanks for your explaination! That's impressive. But I have another question. If it is a regular image classification question and we still use the BN function, in this case, we need to shuffle the data first and we will have different labels' data in these batches to make the model more robust. How can we explain this? Or maybe it's different in the Gan?
Hope for your reply! Thanks so much!

ManoharSai2000 commented 4 years ago

It can be viewed such that the similar labels follow similar distribution and hence their mix, and they do not change rapidly over training as in case of GAN's generator model.