hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.16k stars 725 forks source link

Question about layer normalize and VecNormalize #1031

Closed z6833 closed 3 years ago

z6833 commented 4 years ago

Hello, I have a question about the performance of VecNormalize during training DQN . I used the parms of layer_norm=True, and then chose different flags of VecNormalize. The result of training curves in tensorboard looks like so different . code follws

    def __init__(self, sess, ..., **_kwargs):
        super(MlpPolicy, self).__init__(
            sess,
            ...
            layer_norm=True,
            **_kwargs)
env = VecNormalize(env, norm_obs=True, norm_reward=False)

the red curve for layer_norm=True, norm_obs=False, norm_reward=True, and the other is layer_norm=True, norm_obs=True, norm_reward=False 2020-11-05 14-33-09屏幕截图

My question is why they looks so different ? And I tried other test, I found that once norm_obs=True, then the curves would be bad in DQN(it works well in PPO2) . Is there any advices about layer_norm and norm_obs ?

Miffyli commented 4 years ago

What is the environment used here? In some cases the observation normalization (norm_obs) might not make sense, although I am not too familiar with that. Have you tried running the code without layer_norm? This is layer normalization at the end of DQN, which can be useful in some cases (e.g. some benefit seen in generalization in Procgen). This could be an environment-specific behaviour, so experimentation is required to see what parameters work best.

araffin commented 4 years ago

Some additional context about layer norm and VecNormalize: https://github.com/hill-a/stable-baselines/issues/698 (please read the thread).

z6833 commented 4 years ago

What is the environment used here?

Datas like rtt, delay, throughput ...(used for measuring network status) are collected as stats at regular intervals. Then norm_obs may not make sense for these indicators ?

Have you tried running the code without layer_norm?

I have tried all combinations with layer_norm, norm_obs, norm_rew, and the best combination is layer_norm=True, norm_obs=False, norm_rew=True/False. I tried layer_norm=default(False),andnorm_obs=True, norm_rew=True (experences from PPO training) at first, and got bad performance. Then I found that layer_norm=True works better.

Some additional context about layer norm and VecNormalize: #698 (please read the thread)

@araffin thanks . I noticed your advice it is usually crucial for PPO and A2C, but not for SAC and TD3 for instance. I can get better performance while training PPO with norm_obs=True.

I noticed that datas would be normalized twice at the first layer of NN when both layer_norm and norm_obs are True. Does that effect the result ? I just dont know why that the same operations--norm_obsandnorm_rewareTrue--forstats` looks so different .

Miffyli commented 4 years ago

Generally normalizing observations have been found to be important (see e.g. this paper). Effect of layer normalization is still not so well known (it seems to help with generalization but I have personally had bad results when I tried it / have heard similar comments from others).