utiasDSL / safe-control-gym

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and RL
https://www.dynsyslab.org/safe-robot-learning/
MIT License
595 stars 124 forks source link

MSE counter should be incremented by one? #119

Closed adamhall closed 1 year ago

adamhall commented 1 year ago

I noticed that the MSE computation is off by one step. In step, the state gets updated, but self.ctrl_step_counter doesn't get updated until after _get_info and _get_rewardare called. Thus, thestate_error` is subtracting the previous time step's reference state from the current state.

Justin-Yuan commented 1 year ago

@adamhall I don't recall how we define the MSE, but I thought at any given step, aren't we supposed to produce an action/input that tracks a reference state from the given current state? so the error should be the updated state after digesting the action with the reference state?

another supporting observation is the reference trajectory has shape (T,3), but our state sequence (position) has shape (T+1,3) including the initial and last state (s_0 and s_T). so i think when s_0 tries to track ref_0, we should compute MSE based on the resulting s_1 and ref_0?

adamhall commented 1 year ago

Yes this can make sense, but the reference we are actually using starts at t=0. So that means the first point in the reference has generally been close to the initial state of all 0s. So it would be more appropriate to maybe start the reference (as @JacopoPan mentioned on slack) to start the reference at 1/self.CTRL_FREQ? However, I think its nice having the desired initial state in the reference, for debugging, plotting and for general understanding (like when you want to start on or near the reference).

Justin-Yuan commented 1 year ago

@adamhall i think both points (start the 1st ref to t != 0 and starting near the initial ref) make sense, but it might require modifying the ref traj generation instead of adding 1 to ctrl_step_counter? since using ctrl_step_counter+1 in get_reward() and get_info() will change the convention, meaning get_obs() will need to fetch the next reference with ctrl_step_counter+2 (in which case at the last timestep it will try to fetch ref[T+1,3], out of bound).

adamhall commented 1 year ago

kk I'll think on it a bit to see the best way to do that.

adamhall commented 1 year ago

@Justin-Yuan do the changes look good to you?