Open RudolfReiter opened 1 year ago
Hmm...What are the inputs to the safety policy again? I think we can add it after the action received from the RL policy.
Inputs to the safety filter are the same as for MPC with additionally the proposed control output, e.g., RL policy output
Proposal:
1) Modify the create_unified_flexiblearmenv_and_controller
method to also return the safety_filter (we are making the controller there so we should technically also have access to everything that we need in order to make the safety filter)
2) create a wrapper class SafetyWrapper
with:
class SafetyWrapper (policies.BasePolicy):
def __init__(self, policy: policies.BasePolicy, safety_filter: NMPC):
...
def _predict(self, observation, deterministic: bool = False):
proposed_action = self.policy(observation)
safe_action = self.safety_filter(proposed_action, observation)
return safe_action
and we use this class instead of the RL/IRL/NMPC policy for evaluation
Thoughts?
I like it, looks like a neat solution.
For completeness and to address one of the reviewers comments, we should also train RL, including the safety filter.
@Erfi: Where in the current setup can I add the safety filter (which is kind of a MPC) after the RL policy? Can you point that out or prepare the evaluation/data collection routine, such that it accepts this module after the RL policy?