utiasDSL / safe-control-gym

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and RL
https://www.dynsyslab.org/safe-robot-learning/
MIT License
560 stars 123 forks source link

Fixing LQR stabilization #118

Closed Federico-PizarroBejarano closed 1 year ago

Federico-PizarroBejarano commented 1 year ago

Fixed small error in LQR code and changed the test to be more interesting. Essentially the problem stems from the fact we changed what X_EQ is in quadrotor, and so I'm unsure if it would cause errors in other places of the code. In quadrotor, it was always stabilizing to (0, 0) instead of the desired goal point since it expected X_EQ to hold the stabilization goal but it now is only an array of 0s.

The code in cartpole does what I believe used to be done, setting X_EQ to be the first X_GOAL: https://github.com/utiasDSL/safe-control-gym/blob/bfe74b3327802d7578217ca1f42ddbdd49d4bf94/safe_control_gym/envs/gym_control/cartpole.py#L423

However, in quadrotor, this is not done (it is set to 0s instead): https://github.com/utiasDSL/safe-control-gym/blob/bfe74b3327802d7578217ca1f42ddbdd49d4bf94/safe_control_gym/envs/gym_pybullet_drones/quadrotor.py#L562

I think either of these approaches is fine, but we should decide since this is inconsistent and is causing problems.

Justin-Yuan commented 1 year ago

@adamhall it seems the MPC controllers also use X_EQ in linearization but is fixed as 0 as defined in the quadrotor task, can you confirm this is the option we are going with?

Federico-PizarroBejarano commented 1 year ago

Decided to make X_EQ all zeros every time.

Federico-PizarroBejarano commented 1 year ago

@siqizhou @adamhall I am now always getting the LQR gain around X_EQ and U_EQ which means there is no need to recalculate the gain for each tracking iteration, just calculate it once. Somehow this leads to the same behaviour as before (the gain does not change if linearized about a goal point or the equilibrium). Not sure why this is the case.

adamhall commented 1 year ago

Changes look fine to me but I do have one question:

@siqizhou @adamhall I am now always getting the LQR gain around X_EQ and U_EQ which means there is no need to recalculate the gain for each tracking iteration, just calculate it once. Somehow this leads to the same behaviour as before (the gain does not change if linearized about a goal point or the equilibrium). Not sure why this is the case.

Is this true to stabilization and trajectory tracking or just stabilization? Previously, it looks like the trajectory tracking was linearizing about each point of the trajectory, which should be much better than just linearizing once about U_EQ and X_EQ.

Federico-PizarroBejarano commented 1 year ago

@adamhall i thought in our discussion we decided that the linearization should always be about a equilibrium point (i.e. X_EQ). Additionally, I just checked and the linearization is identical regardless of the equilibrium point passed to compute_lqr_gain as long as theta and theta_dot are 0 (which they are for the goal points). That is why switching the linearization from using X_GOAL[step] to X_EQ changes nothing. Unsure if this is expected or not.

Federico-PizarroBejarano commented 1 year ago

Tested everything and other than the minor LQR stabilization bug, everything else works the same as before but is a lot more clean and concise imo

adamhall commented 1 year ago

Sorry! Yes! this makes sense because the linearization is only really a function of theta and the inputs. Since theta is just 0 in our reference, it doesn't make a difference. That checks out. I am satisfied now.