This update enhances the modularization of the reward calculation process, introducing additional terms to the reward and info dictionaries returned by the environment.
Additionally, CSVLogger names have been refined, and these new metrics are now included. Corresponding adjustments have been made to the training and evaluation logging callbacks for DRL algorithms.
In essence, the reward now distinguishes more effectively between absolute values of energy and comfort violation, their respective absolute penalties, and the weighted terms summed in the reward. This enables better adaptation and facilitates the creation of new rewards inheriting from it.
Reward section has been improved in documentation, with new diagrams.
Types of changes
[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
Description
This update enhances the modularization of the reward calculation process, introducing additional terms to the reward and info dictionaries returned by the environment.
Additionally, CSVLogger names have been refined, and these new metrics are now included. Corresponding adjustments have been made to the training and evaluation logging callbacks for DRL algorithms.
In essence, the reward now distinguishes more effectively between absolute values of energy and comfort violation, their respective absolute penalties, and the weighted terms summed in the reward. This enables better adaptation and facilitates the creation of new rewards inheriting from it.
Reward section has been improved in documentation, with new diagrams.
Types of changes
Checklist:
autopep8
second level aggressive.isort
.cd docs && make spelling && make html
pass (required if documentation has been updated.)pytest tests/ -vv
pass. (required).pytype -d import-error sinergym/
pass. (required)Changelog: