Closed vitiennam closed 5 years ago
Correct. We replace the "ground truth" reward from the environment with the latent reward. In simulation it's easy to get the ground truth reward, but not for (e.g.) real-world robot tasks, at which point the reward returned by the wrapped_env is meaningless. Also, it's usually not that expensive to compute the ground truth reward, so the overhead shouldn't be large.
In the future, it would help to link to the code rather than taking a screenshot.
Hi,
I am a little bit confused with your distance: You calculate reward twice ??? What is the benefit of the first calculate ???
Thanks