Hello, I have an A2C model that i trained using VecNormalize wrapper and when evaluating using a another environment, the Boolean "training" parameter gives different results going from training == False to training == True, I saw the documentation saying that this parameter allows "Whether to update or not the moving average" but, I still don't know if keeping the training as True is consider cheating myself. So, I need help.
Hello, I have an A2C model that i trained using VecNormalize wrapper and when evaluating using a another environment, the Boolean "training" parameter gives different results going from training == False to training == True, I saw the documentation saying that this parameter allows "Whether to update or not the moving average" but, I still don't know if keeping the training as True is consider cheating myself. So, I need help.