Hi, thanks for the wonderful job! I integrated this code onto Metaworld benchmark, and trained a agent to solve the drawer-close task. The trained agent can successfully solve the task, but rewards calculated by the recovered reward function is very small. The following code is how I get rewards from disc:
The pre_reward is very smaller than the ground truth reward, how can I get rewards close to the ground truth reward. Thanks.
0 < preward < 2
-1 < ground truth reward < 4000
I normalise the value of recovered reward function in the range of 2 to 4000, and the recovered reward function shows asymptotic performance with the ground-truth reward function.
Hi, thanks for the wonderful job! I integrated this code onto Metaworld benchmark, and trained a agent to solve the drawer-close task. The trained agent can successfully solve the task, but rewards calculated by the recovered reward function is very small. The following code is how I get rewards from disc:
The pre_reward is very smaller than the ground truth reward, how can I get rewards close to the ground truth reward. Thanks. 0 < preward < 2 -1 < ground truth reward < 4000