I realized that the state is not normalized. This might not be a big issue, because if the state is never normalized, the networks should still be able to learn to make correct predictions from this. However, I think for fixed hyperparameters, an unnoramlized state could have a different influence on, for example, the magnitude of losses and also predictions right after the network weights are initialized.
I would very much appreciate someone else's insight on this and how much this may really change the resulting policy.
I realized that the state is not normalized. This might not be a big issue, because if the state is never normalized, the networks should still be able to learn to make correct predictions from this. However, I think for fixed hyperparameters, an unnoramlized state could have a different influence on, for example, the magnitude of losses and also predictions right after the network weights are initialized.
I would very much appreciate someone else's insight on this and how much this may really change the resulting policy.
Cheers, Rosa