utiasDSL / safe-control-gym

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and RL
https://www.dynsyslab.org/safe-robot-learning/
MIT License
585 stars 122 forks source link

Is the safe_action obtained through the safe layer outside the action space's bounds? #29

Closed djp-orz closed 2 years ago

djp-orz commented 2 years ago

In the code, the safe action is obtained by _"correction = max_mult * max_g actionnew = act - correction" If the action space is bounded, for example [-1,1], I think it is possible that the safe action is out of the bound of the action space. I do not know if I understand the code correctly.

Justin-Yuan commented 2 years ago

hi @djp-orz, thanks for pointing out this issue, to confirm I believe you are concerned about this snippet:

https://github.com/utiasDSL/safe-control-gym/blob/b3f69bbed8577f64fc36d23677bf50027e991b2d/safe_control_gym/controllers/safe_explorer/safe_explorer_utils.py#L183-L185

You are correct that the filtered action could go out of action bounds, but the included benchmark envs will internally apply action clipping in _preprocess_control() before feeding the action to step the simulation (see here for cartpole and here for quadrotor), so the resulting action wil still be valid.

Alternatively, you can also apply the action clipping in the policy or agent class, which might even benefit learning given a more limited action range in data. For the implemented safety layer here, we opt for the simpler case and let the env handle clipping, but you are free to customize as you see fit.

djp-orz commented 2 years ago

Thank you for your reply!

Maybe the clipped action causes another problem, i.e. the clipped action does not satisfy the KKT condition or the equation (5) in the paper ''Safe Exploration in Continuous Action Spaces'' , which may lead to such a fact that the next state applyed the clipped safe action is still unsafe. This is what I worry about, and I encountered this situation in my experiment.