Open mrgloom opened 5 years ago
@mrgloom tldr: If we don't construct different mini-batches for real and fake samples, batch normalization will not work as it supposed and get any profit.
The purpose of batch normalization is to reduce internal covariance shift in activation maps by making all of the activations be distributed equally (with zero mean and std equal to 1). In this case, there is no necessity for a neural network to adapt to changes in distributions of activations, that occur due to the changes in weights during the training process. As a result, such normalization simplifies learning significantly.
At the very beginning of GAN's training real and fake samples in a mini-batch have very very different distributions, thus if we try to normalize it, we won't end up with well-centered data. Moreover, the distribution of such normalized data will be changing significantly during the training (because the generator provides better and better results), and the discriminator will have to adapt to these changes.
@mrgloom tldr: If we don't construct different mini-batches for real and fake samples, batch normalization will not work as it supposed and get any profit.
The purpose of batch normalization is to reduce internal covariance shift in activation maps by making all of the activations be distributed equally (with zero mean and std equal to 1). In this case, there is no necessity for a neural network to adapt to changes in distributions of activations, that occur due to the changes in weights during the training process. As a result, such normalization simplifies learning significantly.
At the very beginning of GAN's training real and fake samples in a mini-batch have very very different distributions, thus if we try to normalize it, we won't end up with well-centered data. Moreover, the distribution of such normalized data will be changing significantly during the training (because the generator provides better and better results), and the discriminator will have to adapt to these changes.
Thanks for your explaination! That's impressive. But I have another question. If it is a regular image classification question and we still use the BN function, in this case, we need to shuffle the data first and we will have different labels' data in these batches to make the model more robust. How can we explain this? Or maybe it's different in the Gan?
Hope for your reply!
Thanks so much!
It can be viewed such that the similar labels follow similar distribution and hence their mix, and they do not change rapidly over training as in case of GAN's generator model.
What is motivation behind "Construct different mini-batches for real and fake, i.e. each mini-batch needs to contain only all real images or all generated images" can you eleborate on that?