Issues with inference when using BatchNorm

alex-sage commented 6 years ago

I'm trying to run some experiments using your model defined in gan_cifar_resnet.py. However when doing inference, I've noticed some variations in samples that should be the same (e.g. when doing interpolations between 2 constant endpoints in latent space, the generated images of said endpoints don't remain exactly the same as they should). I'm suspecting this is because of BatchNorm which in the standard implementation is not set to differentiate between training and inference and keeps updating its internal values during inference. I've tried passing the is_training parameter to lib.ops.batchnorm.Batchnorm(), and also tried switching to the commented-out "standard version", with no avail. When I pass a constant boolean tensor to the is_training parameter and set update_moving_stats=False it runs, but I get completely "overblown" (very bright, mostly primary colors) output images.

Can somone tell me how to do this properly?

On another note, i've also noticed that the "vanilla-conditional" implementation does not work, as the conditional version of layernorm is missing... How would I go about using this?

igul222 commented 6 years ago

sorry, the batchnorm implementation is a mess. i think this is what you need to do:

train with is_training=True (update_moving_stats and stats_iter can be anything)
after training, run n forward passes (one epoch) with is_training=False, update_moving_stats=True, stats_iter=0,1,...,n-1
finally, do inference with is_training=False, update_moving_stats=False (stats_iter can be anything)

I think you can find my conditional layernorm implementation at https://github.com/igul222/nn/blob/master/tflib/ops/layernorm.py but I haven't looked over that code carefully so there might be bugs. in our experiments acgan conditioning worked better anyway.

On Thu, Nov 2, 2017 at 5:29 PM, verysage notifications@github.com wrote:

I'm trying to run some experiments using your model defined in gan_cifar_resnet.py. However when doing inference, I've noticed some variations in samples that should be the same (e.g. when doing interpolations between 2 constant endpoints in latent space, the generated images of said endpoints don't remain exactly the same as they should). I'm suspecting this is because of BatchNorm which in the standard implementation is not set to differentiate between training and inference and keeps updating its internal values during inference. I've tried passing the is_training parameter to lib.ops.batchnorm.Batchnorm(), and also tried switching to the commented-out "standard version", with no avail. When I pass a constant boolean tensor to the is_training parameter and set update_moving_stats=False it runs, but I get completely "overblown" (very bright, mostly primary colors) output images.

Can somone tell me how to do this properly?

On another note, i've also noticed that the "vanilla-conditional" implementation does not work, as the conditional version of layernorm is missing... How would I go about using this?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/igul222/improved_wgan_training/issues/51, or mute the thread https://github.com/notifications/unsubscribe-auth/ABBP7gIJQPi-87V-ANRksWUZEJlk8iuKks5syl5VgaJpZM4QQhqR .

alex-sage commented 6 years ago

Thank you so much for your very quick and concise answer! Unfortunately even following these steps I wasn't able to make it work, and the resulting images still don't look anything like the samples during training.

Would it be save to substitute your batchnorm implementation with tf.layers.batch_normalization or is this something different?

pierremac commented 6 years ago

I've been looking into this too recently. The problem of the tf.layers.batch_normalization implementation (or any high-level implementation of batch / layer norm currently in TF) is that it's not conditional. It is definitely possible to take that code and turn it into a conditional implementation. But given how long and intricate the code is (many many different cases, depending on whether you use the fused or normal batch norm, if you use virtual batch, renormalization, eager execution...), it's not gonna be too easy for me to do it (but mostly because I'm not a good programmer). However, if someone has done it, I would be very happy to get a pointer because I wasn't able to find a single clean implementation of conditional batch norm for tensorflow, so far.

HarveyYan commented 6 years ago

How about...

In training, set is_training to True, update_moving stats to True and feed stats_iter with iteration number. So that you can keep updating the moving mean and moving variance, and at the same time using the empirical mean and variance of the current training batch.
In inference, set is_training to False and update_moving_stats to False. So that only population mean and variance is used and they should not be updated.

igul222 / improved_wgan_training

Issues with inference when using BatchNorm #51