PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
I have been trying to train a agent for Pendulum-v0 with PPO but have been having a hard time to training it to convergence (i.e. the pendulum wouldn't stay up). The parameter I was using was:
I have been trying to train a agent for Pendulum-v0 with PPO but have been having a hard time to training it to convergence (i.e. the pendulum wouldn't stay up). The parameter I was using was:
I'm not sure whether I made any mistake or used some improper parameters. Could anyone help? Thanks!