intelligent-environments-lab / CityLearn

Official reinforcement learning environment for demand response and load shaping
MIT License
472 stars 171 forks source link

Question about state dimension #17

Closed HYDesmondLiu closed 2 years ago

HYDesmondLiu commented 2 years ago

Hi, thank you for sharing this repo.

I was trying to experiment with the CityLearn environment and Marlisa agent, then I found the dimension of states are varied with different commands.

For example, while doing env.observation_space.shape[0] the return value is 91, however when I do a env.reset() the dimension of state is (28,9), I think the 9 is the building amount. Furthermore, if I save the states in the replay buffer of one single bulding, the dimension becomes 36.

I am quite confused, what was the dimension of states used in CityLearn challenge and Marlisa paper, etc.?

kingsleynweye commented 2 years ago

Hi @HYDesmondLiu. Thanks for bringing this issue to our notice. Please, can you confirm what the version of CityLearn that you are working with? I suggest to use v1.0.0 as it is the latest stable version. If you cloned the repository prior to 2022-06-05, we had a pre-release unstable version pointing to the master branch and could have cloned that. If you can confirm to be using the v1.0.0 version also available directly in the master branch, I will try to reproduce the bug on my end.

HYDesmondLiu commented 2 years ago

Hi @kingsleynweye, thank you for the prompt response. Yes, I cloned this repo before 2022-06-05, but, do you mean 2021? Since it is still May 2022. Here is the version information I found, please let me know if any information is needed.

git log -1 commit ca5e19889e3b3199a3b7b3537e0acf725f22c308 (HEAD -> master, origin/master, origin/HEAD) Author: Kingsley Nweye etonwana@yahoo.com Date: Sun Feb 20 12:28:17 2022 -0600 added comprehensive ignore files

git rev-parse HEAD ca5e19889e3b3199a3b7b3537e0acf725f22c308

kingsleynweye commented 2 years ago

@HYDesmondLiu no, I mean 2022. Looks like you cloned the unstable pre-release. Please, can you go ahead a reclone the current master branch and let me know if you get same behavior?

HYDesmondLiu commented 2 years ago

@kingsleynweye I have recloned the current master branch and it is the same commit as the previous one I have cloned.

kingsleynweye commented 2 years ago

@HYDesmondLiu Okay thanks for the update! I have updated the master branch just now to remove the space heating implementation that may cause errors, so please reclone again. However, after some digging around, I realized the cause of your observation. See my explanations below:

There are two CityLearn class instance variables that only differ in suffix; CityLearn.observation_spaces and CityLearn.observation_space. CityLearn.observation_spaces returns a list of individual Building observation_space variables and since there are 9 buildings, the list has length of 9 spaces.Box objects where 8 of them have shape[0] = 28 and 1 has shape[0] = 27. shape[0] corresponds to the number of active states for each building in buildings_state_action_space.json.

CityLearn.observation_spaces is what is parsed into the agent class when CityLearn.central_agent = False. On the other hand, CityLearn.observation_space is for when CityLearn.central_agent = True and is just a single spaces.Box object i.e. not in a list. The reason its shape[0] = 91 is because, in the central agent configuration, there are 20 states common across all buildings: [month, day, hour, t_out, t_out_pred_6h, t_out_pred_12h, t_out_pred_24h, rh_out, rh_out_pred_6h, rh_out_pred_12h, rh_out_pred_24h, diffuse_solar_rad, diffuse_solar_rad_pred_6h, diffuse_solar_rad_pred_12h, diffuse_solar_rad_pred_24h, direct_solar_rad, direct_solar_rad_pred_6h, direct_solar_rad_pred_12h, direct_solar_rad_pred_24h, carbon_intensity] and they only get counted once. Hence, If we go by the shape of each building's observation_space but count those 20 shared states once, the shape of CityLearn.observation_space= (28*8) + (27*1) - (20*8) = 91. These shapes shouldn't change even after callingCityLearn.reset()` function.

The building_loader function is where CityLearn.observation_space and CityLearn.observation_spaces are constructed.

The reason why the MARLISA.replay_buffer states have shape of 36 is because within the MARLISA implementation, some states get encoded to convert categorical values to numerical for the sake of the regression model or some states are completely removed. [month, hour] are periodically encoded using cosine transformation hence each of these states, is represented by 2 value, bringing the observation count from 28 to 30. [day] is one-hot-encoded to values of [1,2,3,4,5,6,7,8] where 1-7 are Monday to Sunday and 8 is for holidays or special days. This increases rhe observation count to 38. The net_electricity_consumption state is actually removed in MARLISA implementation so it reduces the observation count to 37. I am not sure why you are getting 36 though unless there is something else I missed. You can see where all these happen in the MARLISA.init function.

Hope these clears things up. We will update and improve the documentation in our next release to avoid the confusion between CityLearn.observation_spaces and CityLearn.observation_space. Also please, let me know if you need any more clarifications.

HYDesmondLiu commented 2 years ago

@kingsleynweye Thank you for the detailed explanation. So if I understand it correctly, if I need to learn in a centralized fashion, there are in total 91 states. However, if I want to learn these buildings in a decentralized fashion, then there might be 27 or 28 accordingly. So are the 8 more states appended at the end of the array (of states)?

kingsleynweye commented 2 years ago

@HYDesmondLiu yeah, there are 91 states in a centralized fashion and 27 or 28 in decentralized depending on the number of active states in a building. The other 8 are internally appended in the Agent class. They are no necessarily new states but transformations/encoding of already existing states.

HYDesmondLiu commented 2 years ago

Here are the dimensions of states I got from saving buffers using SAC's (o, a, r, o2, done), do these look reasonable to you?

Building_1 36
Building_2 27
Building_3 25
Building_4 35
Building_5 36
Building_6 36
Building_7 27
Building_8 27
Building_9 27
HYDesmondLiu commented 2 years ago

I think I have figured it out, thank you @kingsleynweye