Closed MichielKempkens closed 1 year ago
Hi @MichielKempkens,
In a building with multiple thermal zones I have of course, separate heating and cooling setpoints. However, when I try to perform simulations with for example PPO, the heating setpoint of zone1 are the same as for zone2. How could I perform such simulations that the algorithm sets setpoints for each thermal zone specific?
I don't understand your question exactly, if you have several thermostat controlling each zone separately, you have to establish an external interface to control all thermostat for each zone. Take a look to action_definition
For example, in the warehouse building you also have Office and Fine storage, these setpoints are also the same in each timestep.
In that environment, you can control each zone setpoints (Office, Fine storage, etc.) separately, this is because I don't understand your question exactly, sorry.
Also if you would perform a simulation with simple rule based control. how would je set these actions separately for each thermal zone?
Same here, reading observation and processing them along the rules established, you should have an output with a dimension which fix with the action space for all set points for all zones.
And if it is possible to do this. What do you think the best way of calculating the comfort penalty would be? Would it for example make sense to calculate the penalty for each zone specific and take the average of all these penalties in the total comfort penalty? Or do you have an other opinion on this?
If you look at rewards.py, you can see that reward functions have temperature_variable
; you can specify all temperature variable of zones you want. Reward class process all temperatures, generate all comfort penalties and accumulate them.
Regards, Alejandro.
Thank you for your reply @AlejandroCN7,
It was indeed controlled separately. I was confused with the discrete action variables. If you want to make all the combinations for the different thermal zones you get of course way too many actions in an environment with 5 thermal zones. continuous action space works perfectly fine.
I got one other question is how can you validate your best model that is simulated? for example with the weather variabilities?
So that you will train your model and then validate that model in different circumstances?
Hi @MichielKempkens,
It was indeed controlled separately. I was confused with the discrete action variables. If you want to make all the combinations for the different thermal zones you get of course way too many actions in an environment with 5 thermal zones. continuous action space works perfectly fine.
We plan to create multi-discrete spaces, so each variable has its own axis of freedom and its own separate discrete space. But since the Stable Baselines 3 algorithms do not support this type of spaces, it has a low priority.
I got one other question is how can you validate your best model that is simulated? for example with the weather variabilities? So that you will train your model and then validate that model in different circumstances?
Sinergym has callbacks and evaluation function in order to validate models during training. Weather variability is used to introduce some noise and avoid overfitting (all the years or episodes with the same weather exactly). You can validate with that noise perfectly or do curriculum learning (different environments with the same agent) if you want. Reference: https://stable-baselines3.readthedocs.io/en/master/guide/callbacks.html#evalcallback
Thank you! @AlejandroCN7
Question ❓
Dear @AlejandroCN7 ,
In a building with multiple thermal zones I have of course, separate heating and cooling setpoints. However, when I try to perform simulations with for example PPO, the heating setpoint of zone1 are the same as for zone2. How could I perform such simulations that the algorithm sets setpoints for each thermal zone specific?
For example, in the warehouse building you also have Office and Fine storage, these setpoints are also the same in each timestep.
Also if you would perform a simulation with simple rule based control. how would je set these actions separately for each thermal zone?
And if it is possible to do this. What do you think the best way of calculating the comfort penalty would be? Would it for example make sense to calculate the penalty for each zone specific and take the average of all these penalties in the total comfort penalty? Or do you have an other opinion on this?
Kind regards, Michiel
Checklist
:pencil: Please, don't forget to include more labels besides
question
if it is necessary.