shamilmamedov / flexible_arm

3 stars 0 forks source link

Training RL with safety filter #30

Open RudolfReiter opened 1 year ago

RudolfReiter commented 1 year ago

For completeness and to address one of the reviewers comments, we should also train RL, including the safety filter.

@Erfi: Where in the current setup can I add the safety filter (which is kind of a MPC) after the RL policy? Can you point that out or prepare the evaluation/data collection routine, such that it accepts this module after the RL policy?

Erfi commented 1 year ago

Hmm...What are the inputs to the safety policy again? I think we can add it after the action received from the RL policy.

RudolfReiter commented 1 year ago

Inputs to the safety filter are the same as for MPC with additionally the proposed control output, e.g., RL policy output

Erfi commented 1 year ago

Proposal:

1) Modify the create_unified_flexiblearmenv_and_controller method to also return the safety_filter (we are making the controller there so we should technically also have access to everything that we need in order to make the safety filter)

2) create a wrapper class SafetyWrapper with:

class SafetyWrapper (policies.BasePolicy):
    def __init__(self, policy: policies.BasePolicy, safety_filter: NMPC):
        ...
   def _predict(self, observation, deterministic: bool = False):
       proposed_action = self.policy(observation)
       safe_action = self.safety_filter(proposed_action, observation)
       return safe_action

and we use this class instead of the RL/IRL/NMPC policy for evaluation

Thoughts?

shamilmamedov commented 1 year ago

I like it, looks like a neat solution.