sisl / ngsim_env

Learning human driver models from NGSIM data with imitation learning.
https://arxiv.org/abs/1803.01044
MIT License
173 stars 79 forks source link

PPO instead of TRPO #24

Open Kailiangdong opened 5 years ago

Kailiangdong commented 5 years ago

Hello, thank you for sharing your code. May I ask a paper question? Since ppo is the upgrade of trpo. Have you considered to use ppo instead of trpo? I am facing this question in my thesis. I wonder why all lastet GAIL paper still use trpo.

Thank you very much.

raunakbh92 commented 5 years ago

Great question! The main reason for using TRPO was that the original GAIL paper used TRPO (see Algorithm 1 in the paper). As you mention, PPO is an innovation over TRPO by invoking importance sampling. I think it will be a great idea to investigate the impact of replacing TRPO by PPO in the GAIL learning loop.