ugr-sail / sinergym

Gym environment for building simulation and control using reinforcement learning
https://ugr-sail.github.io/sinergym/
MIT License
134 stars 36 forks source link

[Feature]: "Weights & Biases" integration #172

Closed manjavacas closed 1 year ago

manjavacas commented 2 years ago

Feature 🚀

The integration of the "Weights & Biases" (wandb) library in Sinergym.

Motivation

This library may be specially interesting, as it integrates both the functionalities of TensorBoard and MLflow.

As explained by @jajimer:

Solution

Deeply study how wandb could be integrated, even considering if it can completely replace TensorBoard and MLflow or not.

Checklist

:pencil: Please, don't forget to include more labels besides Feature request if it is necessary.

kad99kev commented 2 years ago

Hi,

Firstly, thank you for your work on this library.

I have a potential solution to have wandb working with sinergym and sb3. There isn't much to be changed from the example given for MLFlow here.

I have added a minimal working example.

import sys
import gym
import wandb
import numpy as np

from typing import Any, Dict, Tuple, Union

import sinergym
from sinergym.utils.wrappers import LoggerWrapper, NormalizeObservation
from sinergym.utils.constants import RANGES_5ZONE
from sinergym.utils.callbacks import LoggerCallback

from stable_baselines3 import PPO
from stable_baselines3.common.logger import HumanOutputFormat, Logger, KVWriter

# The only addition required.
class WandBOutputFormat(KVWriter):
    """
    Dumps key/value pairs onto WandB.
    """

    def write(
        self,
        key_values: Dict[str, Any],
        key_excluded: Dict[str, Union[str, Tuple[str, ...]]],
        step: int = 0,
    ) -> None:

        for (key, value), (_, excluded) in zip(
            sorted(key_values.items()), sorted(key_excluded.items())
        ):

            if excluded is not None and "wandb" in excluded:
                continue

            if isinstance(value, np.ScalarType):
                if not isinstance(value, str):
                    wandb.log({key: value}, step=step)

# Init wandb here.
wandb.init(
        project="project_name",
        entity="entity_name",
        name="random_agent_" + wandb.util.generate_id(),
    )

env = gym.make('Eplus-5Zone-hot-continuous-v1')
env = NormalizeObservation(env, ranges=RANGES_5ZONE)
env = LoggerWrapper(env)

loggers = Logger(
    folder=None,
    output_formats=[HumanOutputFormat(sys.stdout, max_length=90), WandBOutputFormat()],
)

model = PPO("MlpPolicy", env, verbose=2)
# Set custom logger
model.set_logger(loggers)
model.learn(total_timesteps=10_000, log_interval=1, callback=LoggerCallback(sinergym_logger=True))

Please do let me know if this helps with this issue. Let me know if I should make a PR or if I could help in another way. Thanks!

AlejandroCN7 commented 2 years ago

Hi @kad99kev ! I've been testing the code you gave us and it works! The point is that our LoggerCallback doesn't work for wandb, I don't know if you got that error:

[2022-11-16 09:22:44,069] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Updating idf ExternalInterface object if it is not present...
[2022-11-16 09:22:44,069] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Updating idf Site:Location and SizingPeriod:DesignDay(s) to weather and ddy file...
[2022-11-16 09:22:44,071] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Updating idf OutPut:Variable and variables XML tree model for BVCTB connection.
[2022-11-16 09:22:44,072] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Setting up extra configuration in building model if exists...
[2022-11-16 09:22:44,072] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Setting up action definition in building model if exists...
/usr/local/lib/python3.10/dist-packages/gym/spaces/box.py:73: UserWarning: WARN: Box bound precision lowered by casting to float32
  logger.warn(
Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
[2022-11-16 09:22:44,097] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2022-11-16 09:22:44,106] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym/scripts/Eplus-env-5Zone-hot-continuous-v1-res1/Eplus-env-sub_run1
Traceback (most recent call last):
  File "/workspaces/sinergym/prueba.py", line 66, in <module>
    model.learn(
  File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/ppo/ppo.py", line 310, in learn
    return super().learn(
  File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/on_policy_algorithm.py", line 265, in learn
    self.logger.dump(step=self.num_timesteps)
  File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/logger.py", line 486, in dump
    _format.write(self.name_to_value, self.name_to_excluded, step)
  File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/logger.py", line 183, in write
    raise ValueError(
ValueError: Key '   Cooling_Setpoint_RL' truncated to '   Cooling_Setpoint_RL' that already exists. Consider increasing `max_length`.

Then, I removed the callback and we can only register this params currently: Captura desde 2022-11-16 09-49-47

Do you know why our Loggercallback is not compatible with wandb? Or did the code you put in work for you? If you know how to solve it, we can see how to integrate it in a PR made by you :)

kad99kev commented 2 years ago

Hi @AlejandroCN7,

I got that error as well. It is because when creating the keys, large names are truncated which often results in duplicate keys. Hence, that error. To solve that I had changed it to max_length=90.

I believe adding this would help solve the error:

loggers = Logger(
    folder=None,
    output_formats=[HumanOutputFormat(sys.stdout, max_length=90), WandBOutputFormat()],
)

Let me know if you already added this. I would be happy to contribute with a PR, let me know what API structure you have in mind and I can work on it. Thanks!

Here is the link to my WandB project - https://wandb.ai/kad99kev/sinergym-tests?workspace=user-kad99kev

AlejandroCN7 commented 2 years ago

Hi @kad99kev

Yes, I had the max_length added as you mentioned, but I still get the same error. I've tried to increase that value more, but I still get the same result.

In any case, looking at your project, I see that it works! It's great because I see that it doesn't need any big changes to make it work, thank you very much!

Let's see if I can make it work for me too and plain how we can integrate it into the tool (maybe substitute tensorboard+mlflow by wandb in DRL_battery.py), once again thank you very much for everything! :)

kad99kev commented 2 years ago

Hi @AlejandroCN7,

That is indeed weird. I removed the HumanOutputFormat completely and it still worked. I only got that error the first time I ran it. I will look into that sometime. I will see if I can get the wandb example to work with DRL_battery.py and then get back to you.

Thank you!

kad99kev commented 1 year ago

Hi @AlejandroCN7,

Sorry for the delay in getting back with this. I have some code ready, I just need to run a few more experiments with different algorithms to make sure it is working correctly. When it is done, should I make a PR or would you like me to share the code another way?

Link to WandB project - https://wandb.ai/kad99kev/sinergym_tests?workspace=user-kad99kev

Thanks!

AlejandroCN7 commented 1 year ago

Hi @kad99kev

From what I can see, it looks very good. Thank you very much! If you agree, please finish testing everything with the algorithms and generate a PR directly, we'll check it out from there ;)

kad99kev commented 1 year ago

Hi @AlejandroCN7,

I have tried running the experiments for PPO, A2C, DDPG, SAC and TD3 algorithms on the Eplus-5Zone-hot-continuous-v1 environment. From these, however, only PPO and A2C completely finish running. DDPG, SAC and TD3 don't complete even after hours of execution. Should I make a PR, anyway? I would appreciate if you could have a look at it as well. Thank you!

AlejandroCN7 commented 1 year ago

Hi @kad99kev, I think we can make a pull request explaining what you say. Also, I think it is better to add a new script with your wandb integration instead of replacing directly the one we have with MLflow and try to improve it in the future. What do you think?

kad99kev commented 1 year ago

Hi @AlejandroCN7, I did create a new script and integrated WandB there. I will make a PR now. It is a bit hacky as it is just a proof-of-concept that WandB works with Sinergym. Let me know what I could do to further improve it.

AlejandroCN7 commented 1 year ago

Hi @kad99kev, I have added a comment in the PR code. If you need help, let me know :) Thank you so much

kad99kev commented 1 year ago

Hi @AlejandroCN7, I don't see the comment. Could you please send the link to the comment?

AlejandroCN7 commented 1 year ago

Sorry, my bad: https://github.com/ugr-sail/sinergym/pull/258/files#diff-4ba35a8ebc05be86e66cc95ff335266bcf9cb76b9d22cf98048d0c2949f8504c It is indicated in the import line :)

AlejandroCN7 commented 1 year ago

Hi @kad99kev, I have commented directly on the PR to make it more visible.

kad99kev commented 1 year ago

Hi @AlejandroCN7, thank you! I can see the comments now. I have replied to them.