ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.57k stars 829 forks source link

PPO Not Converge for Pendulum-v0 #260

Open ZhizhenQin opened 3 years ago

ZhizhenQin commented 3 years ago

I have been trying to train a agent for Pendulum-v0 with PPO but have been having a hard time to training it to convergence (i.e. the pendulum wouldn't stay up). The parameter I was using was:

python main.py \
    --env-name "Pendulum-v0" \
    --algo ppo \
    --use-gae \
    --lr 4e-4 \
    --clip-param 0.2 \
    --value-loss-coef 0.5 \
    --num-steps 128 \
    --num-mini-batch 32 \
    --log-interval 1 \
    --use-linear-lr-decay \
    --entropy-coef 0

I'm not sure whether I made any mistake or used some improper parameters. Could anyone help? Thanks!