NormalizeVecObservation Wrapper Shape Mismatch for Mean and Var

Hi,

The mean and var in the NormalizeVecObservation wrapper located here are shaped as (NUM_ENVS, ) + obs.shape. I think they should be shaped like a single observation, that is, obs[0].shape, considering they're supposed to calculate a running average across NUM_ENVS. This approach would match the shape used in the reward normalization wrapper found here.

While this doesn't seem to affect performance, since the mean and variance are correctly computed across the batch, it unnecessarily increases memory use and has caused some unexpected issues for me, especially when saving the normalization state along with train_state.

luchris429 / purejaxrl

NormalizeVecObservation Wrapper Shape Mismatch for Mean and Var #21