katerakelly / oyster

Implementation of Efficient Off-policy Meta-learning via Probabilistic Context Variables (PEARL)
MIT License
473 stars 125 forks source link

Reward Design in Ant-Dir Tasks #17

Closed ylfzr closed 3 years ago

ylfzr commented 4 years ago

Very insightful paper! Still I have a question about the ant-direction task. In the pearl code, the survive_reward is set as 1.0 in ant_dir.py. But it seems that in the original MAML implementation, the value is set as 0.05 in https://github.com/cbfinn/maml_rl/blob/master/rllab/envs/mujoco/ant_env_rand_direc.py . It seems that survive_reward is added each step running the environment, resulting in a larger reward. Will this be a problem?

katerakelly commented 3 years ago

Hi, thanks for your interest in our work!

We used the environments from the ProMP paper (https://github.com/jonasrothfuss/ProMP/blob/master/meta_policy_search/envs/mujoco_envs/ant_rand_direc.py). I forget the history, but someone figured out the ant env works better with the slightly higher survive reward. All the algorithms were run on this same environment so the results are fair.