Open francisduan opened 4 years ago
PPO is a method of reinforcement learning. However app, maxent and gail are all inverse reinforcement learning method. Due to the emergence of policy-based inverse reinforcement learning algorithms, you can use PPO with any inverse reinforcement learning algorithm to complete the training. References:
Ng A Y, Russell S J. Algorithms for inverse reinforcement learning[C]//Icml. 2000, 1: 2. Ho J, Gupta J, Ermon S. Model-free imitation learning with policy optimization[C]//International Conference on Machine Learning. PMLR, 2016: 2760-2769.
hi, how am i supposed to save expert demo in ppo main?