ugr-sail / sinergym

Gym environment for building simulation and control using reinforcement learning
https://ugr-sail.github.io/sinergym/
MIT License
131 stars 35 forks source link

Differences with data center environment from rl-testbed #114

Closed biemann closed 1 year ago

biemann commented 2 years ago

Hi,

I noticed that there are some differences with the data centre environment from rl-testbed-for-energyplus.

Much of these differences come from the fact that this library uses BCVTB, whereas the testbed uses EMS in order to interact with EnergyPlus.

The observation space here is bigger, which I think is a good thing, as it gives the users potentially more features to use in the algorithms, leading to better policies. And helps for future work, as we can for example consider to study humidity control and other properties, which were typically ignored so far, despite the importance for real-world applications.

But I think, there are some differences in the actions, which will lead to different dynamics in the environment. The rl-testbed-for-energyplus contains two environments: one where we control only the temperature setpoints, and one where we also control the airflow rates (and is the one used in most of the literature using the case study).

Here, we seem to control the heating and cooling setpoints from both zones. I am not sure if controlling the heating setpoints in this environment is very relevant, as the servers in the data centre generate a lot of heat, which lead very rarely to lower than desired temperatures (when training the controller). Maybe for some very specific regions with very cold weather conditions (like Northern Finland in winter), it may be necessary to heat the air in the loop a bit to avoid freezing the components. But I suppose controlling this part would probably make training unnecessarily more difficult in general.

Moriyama et al used an actuator, where the agent controls the temperature setpoint of the evaporative coolers and the cooling coil to a common temperature. They also control the air volume fan with another actuator.

Here it seems that we control the setpoint of the thermostat inside the zones. This seems to be controlled by a setpoint manager (that is commented out in rl-testbed), which uses internal information of the model to calculate the respective setpoints of the evaporative coolers and the cooling coil, based on internal information, such as airflow rate in order to meet the desired setpoint of the thermostat. This has the advantage to lead to precise and predictable temperature control, but it may be more difficult to reduce energy consumption as much, as the agent is potentially less flexible.

That said, I am not too sure if what I said, as both EnergyPlus and HVAC systems are still kind of a black box for me. But I suppose this will lead to differences in the environment (I have not done experiments and compared results yet).

The differences are a matter of system design and the difference is quite subtle. I just thought to mention this, as you seem to care a lot about making the most realistic simulations as possible (such as taking a lot of care about design days; something I did not consider until now).

Note that this is not an issue, but more a general remark for discussion in environment design.

jajimer commented 2 years ago

Hi, Marco.

Thanks for pointing this out. You are absolutely right. Here the flow rate is not controlled as in Moriyama's and other works, only the 4 temperature setpoints. We did this i) for sake of simplicity, to be honest, and ii) Because in the 5-zone building you also control heating and cooling setpoints, so you could experiment and analize the impact of the building type in the energy savings, or maybe train a single algorithm that could control both environments.

Right now we are working on improving how IDF files are handled, and one of the things we want to include is the possibility to change the controlled equipment. We are struggling a bit (since EnergyPlus and IDFs are also a little bit black box for us...), but once solved we will include the possibility to control also flow rates as in rl-testbed-for-energyplus.

biemann commented 2 years ago

Hi Javier,

Besides the air flow rates, I just wanted to point out the more subtle point that this environment seems to control the setpoint of the thermostat in the zone, whereas Moriyama controls the internal setpoint of the evaporative coolers/cooling coils.

But I am aware that such an issue must be tricky to solve (and nothing wrong with the current code :)). But actually, I would be interested in also contributing to the codebase and not just opening complex issues like this one left and right :). Is there some testing procedure to follow in order to do pull requests? I would be interested in for example introducing other reward functions.

I am interested in this library, as I find the codebase very clean and I also want to test algorithms in other environments. My current codebase where I hacked code from multiple sources together gives me headaches to work with.

jajimer commented 2 years ago

Hi Marco.

Thanks for the PR! :)

We are working on a CONTRIBUTING.md file to establish some guidelines for adding new codebase, but feel free to add new PRs meanwhile.

Regarding reward functions, I am changing a little bit how they are defined. I am adding a base reward class that takes the environment as argument, so the computation can be performed using environment attributes. In short:


class BaseReward(object):

    def __init__(self, env):
        self.env = env

    def compute(self):
         # Any calculation method using env attributes

And all the other rewards should inherit from it or have the same structure. Some minor changes may also add to the environment class, but taking that in mind you could implement your own functions.

biemann commented 2 years ago

Thanks a lot!

It compiles, but I will wait with submitting it (I have it in my forked branch). The parameters of the reward function are quite sensitive to the environment and currently works worse than the original.

For this reason, I like the change you are implementing a lot. Especially lambda_energy is very dependent on the environment and needs completely different values depending on the environment.

(Sorry if this is the wrong place to discuss this :))