Closed z6833 closed 3 years ago
What is the environment used here? In some cases the observation normalization (norm_obs
) might not make sense, although I am not too familiar with that. Have you tried running the code without layer_norm
? This is layer normalization at the end of DQN, which can be useful in some cases (e.g. some benefit seen in generalization in Procgen). This could be an environment-specific behaviour, so experimentation is required to see what parameters work best.
Some additional context about layer norm and VecNormalize
: https://github.com/hill-a/stable-baselines/issues/698 (please read the thread).
What is the environment used here?
Datas like rtt, delay, throughput ...(used for measuring network status)
are collected as stats
at regular intervals. Then norm_obs
may not make sense for these indicators ?
Have you tried running the code without
layer_norm
?
I have tried all combinations with layer_norm, norm_obs, norm_rew
, and the best combination is layer_norm=True, norm_obs=False, norm_rew=True/False
. I tried layer_norm=default(False)
,andnorm_obs=True, norm_rew=True (experences from PPO training)
at first, and got bad performance. Then I found that layer_norm=True
works better.
Some additional context about layer norm and
VecNormalize
: #698 (please read the thread)
@araffin thanks . I noticed your advice it is usually crucial for PPO and A2C, but not for SAC and TD3 for instance
. I can get better performance while training PPO with norm_obs=True
.
I noticed that datas would be normalized twice at the first layer of NN when both layer_norm
and norm_obs
are True
. Does that effect the result ? I just dont know why that the same operations--
norm_obsand
norm_reware
True--for
stats` looks so different .
Generally normalizing observations have been found to be important (see e.g. this paper). Effect of layer normalization is still not so well known (it seems to help with generalization but I have personally had bad results when I tried it / have heard similar comments from others).
Hello, I have a question about the performance of VecNormalize during training DQN . I used the parms of
layer_norm=True
, and then chose different flags ofVecNormalize
. The result of training curves intensorboard
looks like so different . code follwsthe red curve for
layer_norm=True, norm_obs=False, norm_reward=True
, and the other islayer_norm=True, norm_obs=True, norm_reward=False
My question is why they looks so different ? And I tried other test, I found that once
norm_obs=True
, then the curves would be bad in DQN(it works well in PPO2) . Is there any advices aboutlayer_norm
andnorm_obs
?