batch normalization not actually enabled?

Hi, this repo has been very helpful to me as I'm learning DDPG myself. As an exercise to make sure I understood what's going on, I re-implemented a similar DDPG setup using Keras, and in the process I noticed something -- I don't think your batch_normalization layers are ever actually learning (adjusting their weights), so they are essentially no-ops except for the small epsilon value. It looks like with tflearn you need to set is_training to true during training steps: http://tflearn.org/config/#is_training

Interestingly, with my Keras implementation I get very similar performance to yours when I disable my batch normalization layers. When I enable my batch norm layers, performance is actually much worse and the agent often doesn't solve Pendulum-v0 even after hundreds of episodes.

I found a couple discussions around the web where other people discuss the difficulties they've had getting batch normalization to work well with DDPG, in spite of what the original papers says. For example this reddit post. It all makes me very curious.

Anyway, sorry this all is mostly just for my own benefit as I'm learning, but I thought you'd like to know. Thanks again for sharing your code!

pemami4911 / deep-rl

batch normalization not actually enabled? #17