ugr-sail / sinergym

Gym environment for building simulation and control using reinforcement learning
https://ugr-sail.github.io/sinergym/
MIT License
127 stars 34 forks source link

[Question] Update on #395 #421

Closed kad99kev closed 2 months ago

kad99kev commented 2 months ago

Hello @AlejandroCN7! Related to Issue #395, I was just wondering if I were to update the EnergyPlus version from 23.1.0 to 23.2.0 to see if there is still a Datacenter issue. Would that affect the library's workings? Would there be any side effects that would not be very obvious?

I haven't tried it yet, so I thought I would ask you in case you have already. Thank you!

AlejandroCN7 commented 2 months ago

Hello @kad99kev!

I haven't tried updating the EnergyPlus engine to version 23.2.0 yet. The update shouldn't cause issues apparently. You just need to install the new version of E+ and then update the epJSON files to match the same version as E+. There are tools available to automate this process. If you're using the container, update the E+ installation directly from there and then update the building files.

This is something I plan to do in a new version, but I haven't had the time to do it yet. I hope this helps :). Thank you very much for your support on the project.

kad99kev commented 2 months ago

Hi @AlejandroCN7, thank you so much for your reply!

When I convert the IDF files to epJSON, do I need to make any changes in the epJSON or it should be fine to directly use the newly generated file?

Additionally I get an error saying

 ** Warning ** Version: in IDF="24.2" not the same as expected="23.1"

How do I tell sinergym which eplus version to use?

kad99kev commented 2 months ago

I figured it out, I had to update the PYTHONPATH. After I did that I changed the rewards to include both east and west, but it seems like the comfort violations are still around 100%. So I am assuming that the datacenter issue still hasn't been fixed yet.

kad99kev commented 2 months ago

I also just wanted to ask, what do the typical learning curves for the 1ZoneDatacenter, OfficeMedium and GridOffice look like? I was trying to train basic agents with PPO and they do not seem to converge. For the 1ZoneDatacenter, it gets the same reward every episode. With the offices I notice the comfort violation being close to 100%, same issue as the 2ZoneDatacenter.

Let me know if there is any way I can help with this! I could not figure out the issue myself, so I would love to have your thoughts on it.

AlejandroCN7 commented 2 months ago

Hi @kad99kev!

When I convert the IDF files to epJSON, do I need to make any changes in the epJSON or it should be fine to directly use the newly generated file?

It should be fine, you have to update IDF file to your EnergyPlus version and then convert to epJSON. EnergyPlus has tools to do both steps.

How do I tell sinergym which eplus version to use?

If you have several EnergyPlus versions installed, you have to specify what you want to use in a environment variable, same for the EnergyPlus Python API:

EPLUS_PATH=<Your EnergyPlus installation Path>
PYTHONPATH=<Your EnergyPlus installation Path>

I also just wanted to ask, what do the typical learning curves for the 1ZoneDatacenter, OfficeMedium and GridOffice look like? I was trying to train basic agents with PPO and they do not seem to converge. For the 1ZoneDatacenter, it gets the same reward every episode. With the offices I notice the comfort violation being close to 100%, same issue as the 2ZoneDatacenter.

Regarding this, several things. I haven't done many experiments on these buildings; I just made sure that the control was applied from Sinergym so that I could properly connect the DRL algorithms and that it started to improve in some way. I will ask my research team if anyone has used them more than I have.

As for the comfort violation percentage, use that metric with care. The building has several zones, and if the temperature goes out of range in one of the zones, it is already considered a timestep out of range, which influences the final percentage. So, treat that metric as a simple guide. To better understand how your agent is performing, you should visualize other aspects such as energy consumption and the internal temperatures of each zone separately, to see how much it is going out of the comfort range and if it is happening in all the zones at the same time. In the end, the reward functions sum the degrees out of each zone and use that to apply penalties to the signal.