openai / coinrun

Code for the paper "Quantifying Transfer in Reinforcement Learning"
https://blog.openai.com/quantifying-generalization-in-reinforcement-learning/
MIT License
390 stars 86 forks source link

batch norm always has is_training = True #41

Closed Florence-C closed 4 years ago

Florence-C commented 4 years ago

Hello,

I have two questions regarding batch normalization. In the policy, when applying a batchnorm, the is_training parameter is always set as True. Why is the batch norm in training mode for both act_model and train_model in ppo ? More precisely, why not setting the batchnorm in test mode when collecting data (with the act model) ?

Second, how is the batchnorm layer applied at test time ? Is it still in training mode ?

Thank you in advance !

kcobbe commented 4 years ago

This repo only supports batch normalizing based on the statistics of the current batch (which is what you get when passing is_training=True). In practice this works reasonable well for training and testing. However, test performance will slightly increase if you instead normalize based on an average of the statistics of many batches. This is what was done in the paper -- you have to save a moving average of the statistics from training and restore them at test time.

Regarding act_model and train_model, it's important that we normalize with similar statistics in both cases. If our rollouts collect data using different normalization statistics (is_training=False), that will introduce distributional shift (between the rollout policy and the current/training policy) that could make the RL training unstable. I'm not sure how detrimental this would be in practice, but I wouldn't expect it to work well.

Florence-C commented 4 years ago

Thanks for your answer !