stheid / safety_guarded_rl

Other
0 stars 0 forks source link

Safeguarded updates #3

Open stheid opened 3 years ago

stheid commented 3 years ago

The new policy is evaluated in an oracle. if the oracle says the result is unsafe, the update is discarded

stheid commented 3 years ago

depends on #1 as this only makes sense after we have a save initial policy.

Implement an Oracle that takes a policy and evaluates it at a random set of points on the boundary line of the feasible region. if any of the succesor states is outside the constraint region, the oracle returns False and therefore the policy updated will be discarded