wyhuai / SkillMimic

Official code release for the paper "SkillMimic: Learning Reusable Basketball Skills from Demonstrations"
Apache License 2.0
181 stars 13 forks source link

r_reg always 1 #3

Closed Andy-Zhou2 closed 2 months ago

Andy-Zhou2 commented 2 months ago

Hi, and thanks for the amazing work!

I am trying to test the code for training the skill policy, and it seems that r_reg is always 1.

In skillmimic.py Line 485, dof_pos_vel is the same as dof_pos_vel_hist since the reward calculation is called before the current observation is updated, and therefore the current observation is same as the past observation.

Would like to see if anyone is able to reproduce the same issue. Thank you in advance!

QihanZhao commented 2 months ago

Hi, thank you for bringing up this issue!

The bug was due to an incorrect order in the refactored release version of the code. Specifically, the self._compute_hoi_observations() function generates a new tensor composed of references. Because of this, it's important to first calculate the reward using self._curr_obs and self._hist_obs before updating self._hist_obs.

I've corrected the order so that the reward is calculated first and then the history is updated. Sorry for the confusion, and thank you again for your feedback!

Please let me know if this fix works for you.