keiohta / tf2rl

TensorFlow2 Reinforcement Learning
MIT License
467 stars 103 forks source link

Where is mpc_trainer.py used in tf2rl? #89

Closed ghost closed 4 years ago

keiohta commented 4 years ago

Hi @cubicgate, it is used in examples/run_mpc.py. You can use as:

$ python examples/run_mpc.py
ghost commented 4 years ago

Thanks @keiohta ! I see it now. A reward function is usually defined as r(s,a), but yours in run_mpc.py is defined as r(s,s', a). The paper Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning https://arxiv.org/pdf/1708.02596.pdf also used r(s,a). How did you create the math formula for r(s,s',a)? Thanks!

keiohta commented 4 years ago

This seems to be a misimplementation! I'll check theory and fix it if I need. Thanks!

ghost commented 4 years ago

Hi @keiohta Your reward_fn_pendulum(obses, next_obses, acts) does not actually use next_obses. You used the exact math formula for Penulum-v0 from gym.

keiohta commented 4 years ago

Yeah, I think you're right. I reflected your kind suggestion to the latest master as above commit. Thanks @cubicgate !