Closed adamhall closed 1 year ago
@adamhall I don't recall how we define the MSE, but I thought at any given step, aren't we supposed to produce an action/input that tracks a reference state from the given current state? so the error should be the updated state after digesting the action with the reference state?
another supporting observation is the reference trajectory has shape (T,3), but our state sequence (position) has shape (T+1,3) including the initial and last state (s_0
and s_T
). so i think when s_0
tries to track ref_0
, we should compute MSE based on the resulting s_1
and ref_0
?
Yes this can make sense, but the reference we are actually using starts at t=0. So that means the first point in the reference has generally been close to the initial state of all 0s. So it would be more appropriate to maybe start the reference (as @JacopoPan mentioned on slack) to start the reference at 1/self.CTRL_FREQ
? However, I think its nice having the desired initial state in the reference, for debugging, plotting and for general understanding (like when you want to start on or near the reference).
@adamhall i think both points (start the 1st ref to t != 0 and starting near the initial ref) make sense, but it might require modifying the ref traj generation instead of adding 1 to ctrl_step_counter
? since using ctrl_step_counter+1
in get_reward()
and get_info()
will change the convention, meaning get_obs()
will need to fetch the next reference with ctrl_step_counter+2
(in which case at the last timestep it will try to fetch ref[T+1,3], out of bound).
kk I'll think on it a bit to see the best way to do that.
@Justin-Yuan do the changes look good to you?
I noticed that the MSE computation is off by one step. In
step
, the state gets updated, butself.ctrl_step_counter
doesn't get updated until after_get_info
and _get_rewardare called. Thus, the
state_error` is subtracting the previous time step's reference state from the current state.