The mean and var in the NormalizeVecObservation wrapper located here are shaped as (NUM_ENVS, ) + obs.shape. I think they should be shaped like a single observation, that is, obs[0].shape, considering they're supposed to calculate a running average across NUM_ENVS. This approach would match the shape used in the reward normalization wrapper found here.
While this doesn't seem to affect performance, since the mean and variance are correctly computed across the batch, it unnecessarily increases memory use and has caused some unexpected issues for me, especially when saving the normalization state along with train_state.
Hi,
The
mean
andvar
in theNormalizeVecObservation
wrapper located here are shaped as(NUM_ENVS, ) + obs.shape
. I think they should be shaped like a single observation, that is,obs[0].shape
, considering they're supposed to calculate a running average acrossNUM_ENVS
. This approach would match the shape used in the reward normalization wrapper found here.While this doesn't seem to affect performance, since the mean and variance are correctly computed across the batch, it unnecessarily increases memory use and has caused some unexpected issues for me, especially when saving the normalization state along with
train_state
.