Improve Convergence By Tracking Historical Generated Outputs

Hey @igul222 , just wanted to add a small update.

I've found that if you add the historical buffer as Apple detailed here, you can increase performance slightly and convergence faster.

Basically, when you train the discriminator, half the inputs generated, and the other half are from a buffer that contains previously generated samples. This allows the discriminator to "encompass" a wider range of fake examples.

However, when it comes to WGAN, I thought this technique would hurt the discriminator since we are estimating the wasserstein distance. However, in practice this has only helped me and wanted to let you know.

It is critical that when you get new generated samples, you randomly select some samples out of the buffer and replace with the newly generated samples. Otherwise, the discriminator conveys a wasserstein distance that is highly inaccurate.

igul222 / improved_wgan_training

Improve Convergence By Tracking Historical Generated Outputs #37