Closed Webbah closed 4 years ago
Variant: Compare mean or min (worst-case) over the sampled performances to robustify the SafeOpt pipeline.
_nMC: number of Monte-Carlo samples as input to runner
runner
Env: Every n_MC simulation: new parameters are drawn
[x] update env parameters in env.reset()
[x] get parameter value from distribution (using partial?)
Agent: only has to update after n_MC simulations Every n_MC simulation: new performance is calculated After n_MC -> average performance used for optimization to find next controller parameters
Changed env.reset() to initialize model parameters with different values using model_param
:
https://github.com/upb-lea/openmodelica-microgrid-gym/blob/c845d1a7769c67ba203bbfeeaa510a09e4cde95d/openmodelica_microgrid_gym/env/modelica.py#L263
Even solves the problem that the first parameter value (before the first step) is correct and avoids loadsteps in the beginning.
@stheid better possibility or other suggestion? Or where else are the initial parameters (from python) set yet?
Problem with additional loadstep?
I mean, OpenModelicaParameters can be functions of any type, so this is not an issue from an implementation state. But you also mean to addidionally parametrize the rest of the environment, right?
First Additionally: Abstract agent class EpisodicLearnerAgent(Agent)
-> safeotp inherits from EpisodicLearnerAgent
and staticctrlAgent
Execution:
agent.observe always call with done = false -> update_params NEVER called! After n_MC runs of n_MC loop agent.performance = mean(episodic_Performance) Call agent.update_params explicitly
-> so present agent and env don't have to be modified
Problem with additional Load class in main script:
@stheid ideas for structure?
Implemented in #59
To represent real world more accurately, define parameters as distribution (gaussian, normal).
Parameters:
(Implementation will be done in #59 )