Open hzaskywalker opened 3 years ago
thanks for reaching out. We re-implemented HIRO in the fork of PFRL
https://github.com/watakandai/pfrl It's a private repo since we are building on top of HIRO for our paper
We found out that our implementation of off_policy_correction
in this repo is wrong,
https://github.com/watakandai/hiro_pytorch/blob/b2b4e5cd0933bc042f674a9ba5c99351a8ac20ed/hiro/models.py#L262
so we fixed it, however, it didn't reach the performance shown in the paper.
AntMaze learns at 2M, AntPush and AntFall do work and start learning at 8M steps.
@hzaskywalker you can check out the example of HIRO here: https://github.com/watakandai/pfrl/blob/master/examples/ant/train_hiro_ant.py
Thanks for the implementation. I only see results on AntMaze. Can you reproduce the results on more challenging environments, like AntPush and AntFall?