(v3.3.8) - Observation normalization bug Fix (again), negative values in obs_rms.var

Description

We are continuing to work on PR #420 and #419. The error we thought was resolved actually wasn't; the supposed fix didn't correctly save the calibrations, which prevented the appearance of NaNs.

Here's an explanation of the issue for documentation purposes. If there are negative values in the var values during normalization, the final observations will contain NaNs. When these calibrations are loaded into a model for evaluation, the agents in SB3 return NaNs in all their action variables. This makes debugging difficult, especially when performing intermediate evaluations to save the best model during training.

The issue only occurred in specific buildings and climates because the environment mistakenly saved mean as the var property. This didn't affect the normalization process immediately since it only happened when retrieving the data, so normaliztion in training is working perfectly. However, during evaluation, if the mean had negative values, it caused failures. If there were no negative values, the evaluation process didn't fail outright but was still incorrect.

This PR definitively fixes the issue, marking it as resolved. Additionally, some minor improvements have been made and are documented in the changelog.

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Documentation (update in the documentation)
[ ] Improvement (of an existing feature)
[ ] Others

Checklist:

[x] I've read the CONTRIBUTION guide (required)
[ ] My change requires a change to the documentation.
[ ] I have updated the tests.
[ ] I have updated the documentation accordingly.
[ ] I have reformatted the code using autopep8 second level aggressive.
[ ] I have reformatted the code using isort.
[ ] I have ensured cd docs && make spelling && make html pass (required if documentation has been updated.)
[ ] I have ensured pytest tests/ -vv pass. (required).
[ ] I have ensured pytype -d import-error sinergym/ pass. (required)

Changelog:

Evl Callback: Using argument train_env instead of inhereted training environment.
Evl Callback: Fixed mean and var normalization calibration set (now it is applied correctly).
Normalization Wrapper: Deleted RecordConstructorArgs.
Normalization wrapper: Deleted RecordConstructorArgs inherit.
Normalize Wrapper: Fixed var property bug (returning mean again instead of var).

ugr-sail / sinergym