utiasDSL / safe-control-gym

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and RL
https://www.dynsyslab.org/safe-robot-learning/
MIT License
560 stars 123 forks source link

Should controllers be allowed to use `U_GOAL` or only `U_EQ`? #102

Open adamhall opened 1 year ago

adamhall commented 1 year ago

Carrying on from #93.

Many controllers have been using env.U_GOAL either as a linearization point or in the cost which uses the true system mass, not the prior mass, which is what symbolic.U_EQ is for. Should controllers have access to U_GOAL or they should exclusively be using U_EQ. @Justin-Yuan What are your thoughts on this for the RL cost and Normalization?

Justin-Yuan commented 1 year ago
Justin-Yuan commented 1 year ago

But for the control methods, this can be tricky since the cost function is part of both the control algorithm and the environment. The ideal case is we have a clear boundary between what's given as the environment/task (which will be used in evaluation) and what's part of the control algorithm. I'd say the cost itself (nonlinear quadratic) is still part of the task side (since we need it in evaluation anyways), but anything that uses linearization (needed in algo optimizations) can use the prior.

Justin-Yuan commented 1 year ago

@adamhall Do we currently have anywhere that needs to be fixed regarding this issue?

JacopoPan commented 1 year ago

@adamhall @Justin-Yuan status?

Justin-Yuan commented 1 year ago

I am leaning towards using symbolic.U_EQ for linearization and env.U_EQ for cost function or reward, the current/updated symbolic model should already be able to expose U_EQ, but I'm not sure if the MPC controllers have been updated to use them as well? @adamhall