Open Kailiangdong opened 5 years ago
Great question! The main reason for using TRPO was that the original GAIL paper used TRPO (see Algorithm 1 in the paper). As you mention, PPO is an innovation over TRPO by invoking importance sampling. I think it will be a great idea to investigate the impact of replacing TRPO by PPO in the GAIL learning loop.
Hello, thank you for sharing your code. May I ask a paper question? Since ppo is the upgrade of trpo. Have you considered to use ppo instead of trpo? I am facing this question in my thesis. I wonder why all lastet GAIL paper still use trpo.
Thank you very much.