Open timmeinhardt opened 6 years ago
In my experience it made things much worse for most of the atari games (both obs normalization and reward normalization): helps for pong, completely ruins breakout.
But I didn't carefully tune the hyper parameters.
I would add a flag.
Same here, I always comment out the Normalization while training
I was going to change normalization to this https://arxiv.org/pdf/1808.04355.pdf
Observation normalization. We run a random agent on our target environment for 10000 steps, then calculate the mean and standard deviation of the observation and use these to normalize the observations when training. This is useful to ensure that the features do not have very small variance at initialization and to have less variation across different environments.
This is actually an interesting idea for a normalisation of the observations. The resulting mean and standard deviation would then be used as starting points for the running mean and standard deviation, right? Because if they are fixed the normalisation would not be robust against new observations that are the result of a better agent exploring new regions of the environment.
PS.: I will submit a PR that adds two flags (normalisation of obs on/off and normalisation of rewards on/off) and resolves issue #87.
I think they don't update it during training.
I think it's sufficient and also at the same time it reduces variance of the gradient updates since normalization is fixed.
Sounds good!
Is there a particular reason why
VecNormalize
is only applied to 1-D observations? If yes, wouldn't it make sense to apply at least the rewards normalization? https://github.com/ikostrikov/pytorch-a2c-ppo-acktr/blob/47ddcbfab806c37ed19f438100300bd4d58c42f3/main.py#L68-L69