Closed Andy-Zhou2 closed 2 months ago
Hi, thank you for bringing up this issue!
The bug was due to an incorrect order in the refactored release version of the code. Specifically, the self._compute_hoi_observations()
function generates a new tensor composed of references. Because of this, it's important to first calculate the reward using self._curr_obs
and self._hist_obs
before updating self._hist_obs
.
I've corrected the order so that the reward is calculated first and then the history is updated. Sorry for the confusion, and thank you again for your feedback!
Please let me know if this fix works for you.
Hi, and thanks for the amazing work!
I am trying to test the code for training the skill policy, and it seems that r_reg is always 1.
In skillmimic.py Line 485,
dof_pos_vel
is the same asdof_pos_vel_hist
since the reward calculation is called before the current observation is updated, and therefore the current observation is same as the past observation.Would like to see if anyone is able to reproduce the same issue. Thank you in advance!