rail-berkeley / rlkit

Collection of reinforcement learning algorithms
MIT License
2.45k stars 550 forks source link

distance in code ??? #76

Closed vitiennam closed 5 years ago

vitiennam commented 5 years ago

Hi,

I am a little bit confused with your distance: image You calculate reward twice ??? What is the benefit of the first calculate ???

Thanks

vitchyr commented 5 years ago

Correct. We replace the "ground truth" reward from the environment with the latent reward. In simulation it's easy to get the ground truth reward, but not for (e.g.) real-world robot tasks, at which point the reward returned by the wrapped_env is meaningless. Also, it's usually not that expensive to compute the ground truth reward, so the overhead shouldn't be large.

In the future, it would help to link to the code rather than taking a screenshot.