Multiple action variables

MichielKempkens commented 1 year ago

Question ❓

Dear @AlejandroCN7 ,

In a building with multiple thermal zones I have of course, separate heating and cooling setpoints. However, when I try to perform simulations with for example PPO, the heating setpoint of zone1 are the same as for zone2. How could I perform such simulations that the algorithm sets setpoints for each thermal zone specific?

For example, in the warehouse building you also have Office and Fine storage, these setpoints are also the same in each timestep.

Also if you would perform a simulation with simple rule based control. how would je set these actions separately for each thermal zone?

And if it is possible to do this. What do you think the best way of calculating the comfort penalty would be? Would it for example make sense to calculate the penalty for each zone specific and take the average of all these penalties in the total comfort penalty? Or do you have an other opinion on this?

Kind regards, Michiel

Checklist

[x] I have read the documentation (required)
[x] I have checked that there is no similar issue in the repo (required)

:pencil: Please, don't forget to include more labels besides question if it is necessary.

AlejandroCN7 commented 1 year ago

Hi @MichielKempkens,

In a building with multiple thermal zones I have of course, separate heating and cooling setpoints. However, when I try to perform simulations with for example PPO, the heating setpoint of zone1 are the same as for zone2. How could I perform such simulations that the algorithm sets setpoints for each thermal zone specific?

I don't understand your question exactly, if you have several thermostat controlling each zone separately, you have to establish an external interface to control all thermostat for each zone. Take a look to action_definition

For example, in the warehouse building you also have Office and Fine storage, these setpoints are also the same in each timestep.

In that environment, you can control each zone setpoints (Office, Fine storage, etc.) separately, this is because I don't understand your question exactly, sorry.

Also if you would perform a simulation with simple rule based control. how would je set these actions separately for each thermal zone?

Same here, reading observation and processing them along the rules established, you should have an output with a dimension which fix with the action space for all set points for all zones.

And if it is possible to do this. What do you think the best way of calculating the comfort penalty would be? Would it for example make sense to calculate the penalty for each zone specific and take the average of all these penalties in the total comfort penalty? Or do you have an other opinion on this?

If you look at rewards.py, you can see that reward functions have temperature_variable; you can specify all temperature variable of zones you want. Reward class process all temperatures, generate all comfort penalties and accumulate them.

Regards, Alejandro.

MichielKempkens commented 1 year ago

Thank you for your reply @AlejandroCN7,

It was indeed controlled separately. I was confused with the discrete action variables. If you want to make all the combinations for the different thermal zones you get of course way too many actions in an environment with 5 thermal zones. continuous action space works perfectly fine.

I got one other question is how can you validate your best model that is simulated? for example with the weather variabilities?

So that you will train your model and then validate that model in different circumstances?

AlejandroCN7 commented 1 year ago

Hi @MichielKempkens,

It was indeed controlled separately. I was confused with the discrete action variables. If you want to make all the combinations for the different thermal zones you get of course way too many actions in an environment with 5 thermal zones. continuous action space works perfectly fine.

We plan to create multi-discrete spaces, so each variable has its own axis of freedom and its own separate discrete space. But since the Stable Baselines 3 algorithms do not support this type of spaces, it has a low priority.

I got one other question is how can you validate your best model that is simulated? for example with the weather variabilities? So that you will train your model and then validate that model in different circumstances?

Sinergym has callbacks and evaluation function in order to validate models during training. Weather variability is used to introduce some noise and avoid overfitting (all the years or episodes with the same weather exactly). You can validate with that noise perfectly or do curriculum learning (different environments with the same agent) if you want. Reference: https://stable-baselines3.readthedocs.io/en/master/guide/callbacks.html#evalcallback

MichielKempkens commented 1 year ago

Thank you! @AlejandroCN7

ugr-sail / sinergym

Multiple action variables #302

Question ❓

Checklist