ugr-sail / sinergym

Gym environment for building simulation and control using reinforcement learning
https://ugr-sail.github.io/sinergym/
MIT License
127 stars 34 forks source link

(v3.3.8) - Observation normalization bug Fix (again), negative values in obs_rms.var #422

Closed AlejandroCN7 closed 2 months ago

AlejandroCN7 commented 2 months ago

Description

We are continuing to work on PR #420 and #419. The error we thought was resolved actually wasn't; the supposed fix didn't correctly save the calibrations, which prevented the appearance of NaNs.

Here's an explanation of the issue for documentation purposes. If there are negative values in the var values during normalization, the final observations will contain NaNs. When these calibrations are loaded into a model for evaluation, the agents in SB3 return NaNs in all their action variables. This makes debugging difficult, especially when performing intermediate evaluations to save the best model during training.

The issue only occurred in specific buildings and climates because the environment mistakenly saved mean as the var property. This didn't affect the normalization process immediately since it only happened when retrieving the data, so normaliztion in training is working perfectly. However, during evaluation, if the mean had negative values, it caused failures. If there were no negative values, the evaluation process didn't fail outright but was still incorrect.

This PR definitively fixes the issue, marking it as resolved. Additionally, some minor improvements have been made and are documented in the changelog.

Types of changes

Checklist:

Changelog: