Closed ylfzr closed 3 years ago
Hi, thanks for your interest in our work!
We used the environments from the ProMP paper (https://github.com/jonasrothfuss/ProMP/blob/master/meta_policy_search/envs/mujoco_envs/ant_rand_direc.py). I forget the history, but someone figured out the ant env works better with the slightly higher survive reward. All the algorithms were run on this same environment so the results are fair.
Very insightful paper! Still I have a question about the ant-direction task. In the pearl code, the survive_reward is set as 1.0 in ant_dir.py. But it seems that in the original MAML implementation, the value is set as 0.05 in https://github.com/cbfinn/maml_rl/blob/master/rllab/envs/mujoco/ant_env_rand_direc.py . It seems that survive_reward is added each step running the environment, resulting in a larger reward. Will this be a problem?