Closed ghost closed 4 years ago
Thanks @keiohta ! I see it now. A reward function is usually defined as r(s,a), but yours in run_mpc.py is defined as r(s,s', a). The paper Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning https://arxiv.org/pdf/1708.02596.pdf also used r(s,a). How did you create the math formula for r(s,s',a)? Thanks!
This seems to be a misimplementation! I'll check theory and fix it if I need. Thanks!
Hi @keiohta Your reward_fn_pendulum(obses, next_obses, acts) does not actually use next_obses. You used the exact math formula for Penulum-v0 from gym.
Yeah, I think you're right. I reflected your kind suggestion to the latest master as above commit. Thanks @cubicgate !
Hi @cubicgate, it is used in
examples/run_mpc.py
. You can use as: