ugr-sail / sinergym

Gym environment for building simulation and control using reinforcement learning
https://ugr-sail.github.io/sinergym/
MIT License
127 stars 34 forks source link

[Question] How to deal with "delayed reward" problem properly? #416

Closed kjhatimr closed 3 months ago

kjhatimr commented 3 months ago

Question ❓

I have already read an issue that tells timing between action and corresponding observation differ. But it is quite strange that whatever the timestep setting is, it takes two steps to retrieve effect of an action, for example, timestep = 12 let me get result after 10 mins, and timestep=1 makes me wait for 2 hours.

I currently have fun with DRL agent, but I think the reward as a result of env.step(action) makes agent confused as the reward has nothing to do with current action the agent makes, but counted for the most important.

Is it a good idea to make EnergyPlus Simulator step twice upon an env.step() setting the timestep lowest possible? then I'm going to implement that. (or is there anyone who has already implemented this?)

Checklist

:pencil: Please, don't forget to include more labels besides question if it is necessary.

MLXFranklove commented 3 months ago

I found the same problem when I ran Use Example. Below is a graph of the monitor data that I exported by running Sinergym's Use Example. To solve this problem, instead of using Sinergym, I used the Energyplus API to interact directly with Energyplus. ec2c061df6a9d6641ebf6132b620268

MLXFranklove commented 3 months ago

`# energyplus library import sys sys.path.insert(0, r"D:\Research_Software\EnergyPlus") from pyenergyplus.api import EnergyPlusAPI from pyenergyplus.datatransfer import DataExchange import numpy as np import csv import time import threading from queue import Queue, Empty, Full from typing import Dict, Any, Tuple, Optional, List

idf_file = r"D:\Code\EnergyPlus_MY\Energyplus_test\Energyplus_model\HVACTemplate-5ZoneVAVFanPowered.idf" epw_file = r"D:\Code\EnergyPlus_MY\Energyplus_test\Weather_data\CHN_Beijing.Beijing.545110_CSWD.epw" idd_file = r"D:\Research_Software\Energyplus\Energy+.idd"

class EnergyPlus: ''' obs_queue是存放观察值的,queue是队列的意思 act_queue是存放动作值的,queue是队列的意思 action_space是动作空间,这个是离散的动作空间 get_action_func就是如何根据神经网络或者其他规则获取action_apace里面的值,queue是队列的意思 '''

def __init__(self, obs_queue: Queue = Queue(1), act_queue: Queue = Queue(1)) -> None:

    # for RL
    self.obs_queue = obs_queue
    self.act_queue = act_queue

    # for energyplus
    self.energyplus_api = EnergyPlusAPI()
    self.dx: DataExchange = self.energyplus_api.exchange
    self.energyplus_exec_thread = None

    # energyplus running states
    self.energyplus_state = None  # 用于存储与EnergyPlus的状态信息
    self.initialized = False  # 指示初始化是否已完成
    self.simulation_complete = False  # 指示仿真是否已经完成
    self.warmup_complete = False  # 指示warmup是否已经完成,每次仿真开始之前都要进行一次warmup
    self.warmup_queue = Queue()  # 是一个队列,用于存储或管理与warmup相关的数据,创建了一个空的线程安全队列
    self.progress_value: int = 0  # 表示模拟的进度
    self.sim_results: Dict[str, Any] = {}  # 存放energyplus的仿真结果

    # request variables to be available during runtime,python向energyplus请求的变量在运行时是否可用,因为energyplus不是所有的变量的是可访问的,在访问变量之前需要向energyplus注册python想要访问的变量
    self.request_variable_complete = False

    # get the variable names csv
    self.has_csv = True

    # variables, meters, actuators
    # look up in the csv file that get_available_data_csv() generate
    # or look up the html file
    '''
    space1-1 都是idf文件里面自定义的名字
    html文件里面也有,可以一个一个试
    csv文件里面也有,csv文件里面可以查看所有的variables、meters和actuators
    '''
    # variables
    self.variables = {
        'outdoor_air_drybulb_temperature': ('Site Outdoor Air Drybulb Temperature', 'Environment'),
        "Site Outdoor Air Relative Humidity": ("Site Outdoor Air Relative Humidity", 'Environment'),
        "Site Wind Speed": ("Site Wind Speed", 'Environment'),
        "Site Wind Direction": ("Site Wind Direction", 'Environment'),
        "Site Diffuse Solar Radiation Rate per Area": ("Site Diffuse Solar Radiation Rate per Area", 'Environment'),
        "Site Direct Solar Radiation Rate per Area": ("Site Direct Solar Radiation Rate per Area", 'Environment'),
        "Zone1-1 Thermostat Cooling Setpoint Temperature": ("Zone Thermostat Cooling Setpoint Temperature", "SPACE1-1"),
        "Zone1-1 Thermostat Heating Setpoint Temperature": ("Zone Thermostat Heating Setpoint Temperature", "SPACE1-1"),
        "Zone2-1 Thermostat Cooling Setpoint Temperature": ("Zone Thermostat Cooling Setpoint Temperature", "SPACE2-1"),
        "Zone2-1 Thermostat Heating Setpoint Temperature": ("Zone Thermostat Heating Setpoint Temperature", "SPACE2-1"),
        "Zone3-1 Thermostat Cooling Setpoint Temperature": ("Zone Thermostat Cooling Setpoint Temperature", "SPACE3-1"),
        "Zone3-1 Thermostat Heating Setpoint Temperature": ("Zone Thermostat Heating Setpoint Temperature", "SPACE3-1"),
        "Zone4-1 Thermostat Cooling Setpoint Temperature": ("Zone Thermostat Cooling Setpoint Temperature", "SPACE4-1"),
        "Zone4-1 Thermostat Heating Setpoint Temperature": ("Zone Thermostat Heating Setpoint Temperature", "SPACE4-1"),
        "Zone5-1 Thermostat Cooling Setpoint Temperature": ("Zone Thermostat Cooling Setpoint Temperature", "SPACE5-1"),
        "Zone5-1 Thermostat Heating Setpoint Temperature": ("Zone Thermostat Heating Setpoint Temperature", "SPACE5-1"),
        "zone_air_temp_1": ("Zone Air Temperature", "SPACE1-1"),
        "zone_air_temp_2": ("Zone Air Temperature", "SPACE2-1"),
        "zone_air_temp_3": ("Zone Air Temperature", "SPACE3-1"),
        "zone_air_temp_4": ("Zone Air Temperature", "SPACE4-1"),
        "zone_air_temp_5": ("Zone Air Temperature", "SPACE5-1"),
        "zone_air_Relative_Humidity_1": ("Zone Air Relative Humidity", "SPACE1-1"),
        "zone_air_Relative_Humidity_2": ("Zone Air Relative Humidity", "SPACE2-1"),
        "zone_air_Relative_Humidity_3": ("Zone Air Relative Humidity", "SPACE3-1"),
        "zone_air_Relative_Humidity_4": ("Zone Air Relative Humidity", "SPACE4-1"),
        "zone_air_Relative_Humidity_5": ("Zone Air Relative Humidity", "SPACE5-1"),
        "Zone1-1 Thermal Comfort Fanger Model PPD": ("Zone Thermal Comfort Fanger Model PPD", "SPACE1-1 PEOPLE 1"),
        "Zone2-1 Thermal Comfort Fanger Model PPD": ("Zone Thermal Comfort Fanger Model PPD", "SPACE2-1 PEOPLE 1"),
        "Zone3-1 Thermal Comfort Fanger Model PPD": ("Zone Thermal Comfort Fanger Model PPD", "SPACE3-1 PEOPLE 1"),
        "Zone4-1 Thermal Comfort Fanger Model PPD": ("Zone Thermal Comfort Fanger Model PPD", "SPACE4-1 PEOPLE 1"),
        "Zone5-1 Thermal Comfort Fanger Model PPD": ("Zone Thermal Comfort Fanger Model PPD", "SPACE5-1 PEOPLE 1"),
        "people_1": ("Zone People Occupant Count", "SPACE1-1"),
        "people_2": ("Zone People Occupant Count", "SPACE2-1"),
        "people_3": ("Zone People Occupant Count", "SPACE3-1"),
        "people_4": ("Zone People Occupant Count", "SPACE4-1"),
        "people_5": ("Zone People Occupant Count", "SPACE5-1"),
    }
    # Heating Coil NaturalGas Energy
    # Cooling Coil Electricity Energy
    self.var_handles: Dict[str, int] = {}

    # meters
    self.meters = {
        "elec_hvac": "Electricity:HVAC",
        "elec_cooling": "Cooling:Electricity",
    }
    self.meter_handles: Dict[str, int] = {}

    # actuators
    self.actuators = {
        "cooling_1": (
            "Zone Temperature Control",
            "Cooling Setpoint",
            "SPACE1-1"
        ),
        "heating_1": (
            "Zone Temperature Control",
            "Heating Setpoint",
            "SPACE1-1"
        ),
        "cooling_2": (
            "Zone Temperature Control",
            "Cooling Setpoint",
            "SPACE2-1"
        ),
        "heating_2": (
            "Zone Temperature Control",
            "Heating Setpoint",
            "SPACE2-1"
        ),
        "cooling_3": (
            "Zone Temperature Control",
            "Cooling Setpoint",
            "SPACE3-1"
        ),
        "heating_3": (
            "Zone Temperature Control",
            "Heating Setpoint",
            "SPACE3-1"
        ),
        "cooling_4": (
            "Zone Temperature Control",
            "Cooling Setpoint",
            "SPACE4-1"
        ),
        "heating_4": (
            "Zone Temperature Control",
            "Heating Setpoint",
            "SPACE4-1"
        ),
        "cooling_5": (
            "Zone Temperature Control",
            "Cooling Setpoint",
            "SPACE5-1"
        ),
        "heating_5": (
            "Zone Temperature Control",
            "Heating Setpoint",
            "SPACE5-1"
        ),
    }
    self.actuator_handles: Dict[str, int] = {}  # 这是存放句柄的字典

def start(self, suffix="defalut"):
    self.energyplus_state = self.energyplus_api.state_manager.new_state()  # 返回EnergyPlus的状态对应的指针,指针指向的内容应该就是EnergyPlus存放的状态,这里边的内容应该会不断的变化,这表示了Energyplus的运行进程
    runtime = self.energyplus_api.runtime

    '''因为energyplus中并不是所有变量都是可以请求的,这里是注册一下需要请求的变量'''
    '''Parameters:state – An active EnergyPlus “state” that is returned from a call to api.state_manager.new_state().
        variable_name – The name of the variable to retrieve, e.g. “Site Outdoor Air DryBulb Temperature”, or “Fan Air Mass Flow Rate”
        variable_key – The instance of the variable to retrieve, e.g. “Environment”, or “Main System Fan
    '''
    # request the variable,因为energyplus中并不是所有变量都是可以请求的,这里是注册一下需要请求的变量
    if not self.request_variable_complete:
        for key, var in self.variables.items():
            self.dx.request_variable(self.energyplus_state, var[0], var[1])
            self.request_variable_complete = True
    '''因为energyplus中并不是所有变量都是可以请求的,这里是注册以后需要请求的变量'''

    # register callback used to track simulation progress,这个函数的作用是检查仿真的进行过程,输入参数是进程值
    def report_progress(progress: int) -> None:
        self.progress_value = progress

    runtime.callback_progress(self.energyplus_state, report_progress)

    # register callback used to signal warmup complete
    def _warmup_complete(state: Any) -> None:
        self.warmup_complete = True
        self.warmup_queue.put(True)

    runtime.callback_after_new_environment_warmup_complete(self.energyplus_state, _warmup_complete)

    # register callback used to collect observations and send actions
    runtime.callback_end_zone_timestep_after_zone_reporting(self.energyplus_state, self._collect_obs)

    # register callback used to send actions
    runtime.callback_end_zone_timestep_after_zone_reporting(self.energyplus_state, self._send_actions)

    # run EnergyPlus in a non-blocking way
    def _run_energyplus(runtime, cmd_args, state, results):
        # print(f"running EnergyPlus with args: {cmd_args}")
        '''#这个地方设置为TRUE则控制窗口会显示energyplus的模拟仿真过程'''
        self.energyplus_api.runtime.set_console_output_status(state=state, print_output=False)  # 这一行程序不重要
        # start simulation
        results["exit_code"] = runtime.run_energyplus(state, cmd_args)  # 这个是要启动EnergyPlus运行

    '''创建线程,调用_run_energyplus函数,开始一个EnergyPlus的模拟,args是需要传入的参数'''
    self.energyplus_exec_thread = threading.Thread(
        target=_run_energyplus,
        args=(
            self.energyplus_api.runtime,
            self.make_eplus_args(suffix),
            self.energyplus_state,
            self.sim_results
        )
    )
    '''启动线程'''
    self.energyplus_exec_thread.start()

def stop(self) -> None:
    if self.energyplus_exec_thread:
        print("self.energyplus_exec_thread:", self.energyplus_exec_thread)
        self.simulation_complete = True  # 模拟完成
        self._flush_queues()  # 将self.obs_queue与self.act_queue队列清空,这应该是和线程相关
        self.energyplus_exec_thread.join()  # 模拟结束,关闭线程
        self.energyplus_exec_thread = None  # energyplus执行线程结束,置为None
        self.energyplus_api.runtime.clear_callbacks()  # 这个用于清理已经注册的所有回调函数,因为此线程已经结束,所以要把所有的回调函数清空
        self.energyplus_api.state_manager.delete_state(
            self.energyplus_state)  # 该函数用于删除现有状态实例,释放内存,也就是要把self.energyplus这个指针给清空

def _collect_obs(self, state_argument):
    calendar_year = self.dx.calendar_year(state_argument)
    month = self.dx.month(state_argument)
    day_of_month = self.dx.day_of_month(state_argument)
    day_of_week = self.dx.day_of_week(state_argument)
    hour = self.dx.hour(state_argument)
    current_time = self.dx.current_time(state_argument)
    print("calendar_year:", calendar_year)
    print("month:", month)
    print("day_of_month:", day_of_month)
    print("day_of_week:", day_of_week)
    print("hour:", hour)
    print("current_time:", current_time)
    if self.simulation_complete or not self._init_callback(state_argument):
        return
    '''上面函数的意思是仿真结束之后返回空'''
    self.next_obs = {
        **{
            key: self.dx.get_variable_value(state_argument, handle)
            for key, handle in self.var_handles.items()
        }
    }
    # **{}这是解包语法,用于将字典中的键值对解包到新的字典中,self.next_obs是一个字典
    # add the meters such as electricity
    for key, handle in self.meter_handles.items():
        self.next_obs[key] = self.dx.get_meter_value(state_argument, handle)
    self.next_obs["day_of_week"] = day_of_week
    self.next_obs["hour"] = hour
    # if full, it will block the entire simulation
    self.obs_queue.put(self.next_obs)  # 将其放到obs_queue队列中,这是将一个字典放到obs_queue队列中
    while self.act_queue.empty():
        time.sleep(0.1)

def _send_actions(self, state_argument):
    if self.simulation_complete or not self._init_callback(state_argument):
        return
    if self.act_queue.empty():
        return
    '''这个是为什么,为什么要有这个action_idx'''
    action_idx = self.act_queue.get()
    print(action_idx)
    actions = action_idx
    '''这个函数是向energyplus输入动作'''
    for i in range(len(self.actuator_handles)):
        # Effective heating set-point higher than effective cooling set-point err
        self.dx.set_actuator_value(
            state=state_argument,
            actuator_handle=list(self.actuator_handles.values())[i],
            actuator_value=actions[i]
        )

'''此函数的用途是将self.obs_queue与self.act_queue队列清空'''

def _flush_queues(self):
    for q in [self.obs_queue, self.act_queue]:
        while not q.empty():
            q.get()

'''此函数的用途是将self.obs_queue与self.act_queue队列清空'''

def make_eplus_args(self, suffix="default"):
    args = [
        "-i",
        idd_file,
        "-w",
        epw_file,
        "-d",
        "res",
        "-p",
        suffix,
        "-x",
        "-r",
        idf_file,
    ]
    return args

"""initialize EnergyPlus handles and checks if simulation runtime is ready,表示energyplus是否已经初始化完成"""

def _init_callback(self, state_argument) -> bool:
    """initialize EnergyPlus handles and checks if simulation runtime is ready"""
    self.initialized = self._init_handles(state_argument) \
                       and not self.dx.warmup_flag(state_argument)
    return self.initialized

# self.dx.warmup_flag(state_argument)返回值为1的时候表示energyplus正在warmup
# self._init_handles(state_argument)意思是energyplus是否已经初始化完成
"""initialize EnergyPlus handles and checks if simulation runtime is ready,表示energyplus是否已经初始化完成"""

'''这个函数用于初始化energyplus的句柄'''

def _init_handles(self, state_argument):
    """initialize sensors/actuators handles to interact with during simulation"""
    '''初始话句柄用于与energyplus运行时的交互'''
    if not self.initialized:
        if not self.dx.api_data_fully_ready(state_argument):
            return False
        # 上面这个函数的意思是否数据交换API已经准备好
        '''get_variable_handle的作用是获取运行模拟中输出变量的句柄'''
        # store the handles so that we do not need get the hand every callback
        self.var_handles = {
            key: self.dx.get_variable_handle(state_argument, *var)
            for key, var in self.variables.items()
        }
        '''获取并保存句柄'''
        self.meter_handles = {
            key: self.dx.get_meter_handle(state_argument, meter)
            for key, meter in self.meters.items()
        }
        '''获取并保存句柄'''
        self.actuator_handles = {
            key: self.dx.get_actuator_handle(state_argument, *actuator)
            for key, actuator in self.actuators.items()
        }
        '''获取并保存句柄'''
        '''因为句柄等于-1表示上述操作没有找到句柄,下面是打印错误操作,当没有找到对应的句柄时,说明variables、meters、actuators中是有错误的'''
        for handles in [
            self.var_handles,
            self.meter_handles,
            self.actuator_handles
        ]:
            if any([v == -1 for v in handles.values()]):
                print("Error! there is -1 in handle! check the variable names in the var.csv")

                print("variables:")
                for k in self.var_handles:
                    print(self.var_handles[k])

                print("meters:")
                for k in self.meter_handles:
                    print(self.meter_handles[k])

                print("actuators")
                for k in self.actuator_handles:
                    print(k)

                self.get_available_data_csv(state_argument)
                exit(1)

        self.initialized = True

    return True

'''这个函数用于初始化energyplus的句柄'''
# get the name and key for handles
'''这个是以易于解析的 CSV 格式列出所有应用程序接口数据内容,通过这个csv文件可以查看能够交互的所有内容'''

def get_available_data_csv(self, state):
    if self.has_csv:
        return
    else:
        available_data = self.dx.list_available_api_data_csv(self.energyplus_state).decode("utf-8")
        lines = available_data.split('\n')
        with open("var.csv", 'w', newline='') as csvfile:
            writer = csv.writer(csvfile)
            for line in lines:
                fields = line.split(',')
                writer.writerow(fields)

        self.has_csv = True

'''这个是以易于解析的 CSV 格式列出所有应用程序接口数据内容,通过这个csv文件可以查看能够交互的所有内容'''

def failed(self) -> bool:
    return self.sim_results.get("exit_code", -1) > 0

` This is my python interaction with Energyplus python code, running on windows, I think the more critical part is shown in the image below image

kjhatimr commented 3 months ago

@MLXFranklove

It looks like you've extracted the EnergyPlus API component (in sinergym/simulators/eplus.py) to interact with EnergyPlus without the Gymnasium interface. The lines below might be necessary because act_queue isn't guaranteed to have data when _collect_obs() is called:

while self.act_queue.empty(): time.sleep(0.1)

I appreciate the significant effort involved in this. However, I wonder if interacting with EnergyPlus directly, bypassing the Gym interface, would help align the actions and observations in the same row in the CSV. I don't see this as a "problem" since it's natural to expect results to appear some time after an action is taken. My concern is that this behavior might mislead the DRL agent.

MLXFranklove commented 3 months ago

I think the misalignment of this data is seriously wrong. In the case of Energyplus using a thermostat to cool the heating set point schedule, that is, python does not interact with Energyplus and Energyplus operates independently, In the exported variable table data, you can see that the thermostat cooling and heating set point at the previous time should be obtained in the variable obtained at the next time, you can run and observe it yourself. My English is not very good, may not be very clear, please forgive me. If you want, I can give you a copy of the code for python to interact with Energyplus in a rule-based control approach, that is, the thermostat cooling and heating set point schedule control scheme proposed above in Energyplus. I also just started to contact the content of this field, this is just my personal opinion, not necessarily right.

MLXFranklove commented 3 months ago

image image image Above are the variable graphs exported at Energyplus runtime and some Settings in the Energyplus IDF file.

kjhatimr commented 3 months ago

@MLXFranklove

Thank you for the detailed explanation! Your reply regarding thermostat control was very clear. From your results, I see that when the heating and cooling setpoints change at 05:00, the resulting indoor temperature changes immediately at the next step (05:15). With the Sinergym interface, I expected the temperature change to occur at the step after next (05:30) and I agree with you, it looks wrong.

Could you kindly share a copy of the Python code used for interacting with EnergyPlus in a rule-based control approach?

MLXFranklove commented 3 months ago

@kjhatimr Ok, please wait an hour or two

MLXFranklove commented 3 months ago

@kjhatimr Rule_based_Control.zip Above is the python file for my rule-based control scheme with the Energyplus IDF file. Maybe my abilities are limited, and the above document may be flawed, because I found that even though I set the cooling and heating set point schedule in python exactly the same as the cooling and heating set point schedule in Energyplus, However, the indoor temperature and other data exported under the interaction between python and Energyplus are slightly different from the indoor temperature and other data when Energyplus runs alone. For details, you can see the following figure: image image

Here is what you need to change in the code, maybe there are other things that need to be changed, if your python package is complete, I think it is also the part shown in the picture below that needs to be changed, there may be some that I have not considered, I use wandb to upload data to the web page, better observation data. image image

kjhatimr commented 3 months ago

@MLXFranklove

Thank you, it was very helpful. In the meantime, I explored a DRL model to address delayed rewards. I found that the Rollout buffer manages the sequence of rewards, and by customizing the compute_returns_and_advantage() function within the RolloutBuffer class, we can effectively handle delayed rewards. For instance, we can skip the current and next reward by shifting RolloutBuffer.rewards[] by -2.