watakandai / hiro_pytorch

Implementation of HIRO (Data-Efficient Hierarchical Reinforcement Learning)
90 stars 20 forks source link

What's the performance on AntPush and AntFall #2

Open hzaskywalker opened 3 years ago

hzaskywalker commented 3 years ago

Thanks for the implementation. I only see results on AntMaze. Can you reproduce the results on more challenging environments, like AntPush and AntFall?

watakandai commented 3 years ago

thanks for reaching out. We re-implemented HIRO in the fork of PFRL

https://github.com/watakandai/pfrl It's a private repo since we are building on top of HIRO for our paper

We found out that our implementation of off_policy_correction in this repo is wrong, https://github.com/watakandai/hiro_pytorch/blob/b2b4e5cd0933bc042f674a9ba5c99351a8ac20ed/hiro/models.py#L262 so we fixed it, however, it didn't reach the performance shown in the paper. AntMaze learns at 2M, AntPush and AntFall do work and start learning at 8M steps.

peasant98 commented 3 years ago

@hzaskywalker you can check out the example of HIRO here: https://github.com/watakandai/pfrl/blob/master/examples/ant/train_hiro_ant.py