nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.57k stars 332 forks source link
deep-learning deep-reinforcement-learning policy-gradient ppo ppo-pytorch proximal-policy-optimization pytorch pytorch-implmention pytorch-tutorial reinforcement-learning reinforcement-learning-algorithms

PPO-PyTorch

UPDATE [April 2021] :

Open PPO_colab.ipynb in Google Colab Open In Colab

Introduction

This repository provides a Minimal PyTorch implementation of Proximal Policy Optimization (PPO) with clipped objective for OpenAI gym environments. It is primarily intended for beginners in Reinforcement Learning for understanding the PPO algorithm. It can still be used for complex environments but may require some hyperparameter-tuning or changes in the code. A concise explaination of PPO algorithm can be found here and a thorough explaination of all the details for implementing best performing PPO can be found here (All are not implemented in this repo yet).

To keep the training procedure simple :

Usage

Note :

Citing

Please use this bibtex if you want to cite this repository in your publications :

@misc{pytorch_minimal_ppo,
    author = {Barhate, Nikhil},
    title = {Minimal PyTorch Implementation of Proximal Policy Optimization},
    year = {2021},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/nikhilbarhate99/PPO-PyTorch}},
}

Results

PPO Continuous RoboschoolHalfCheetah-v1 PPO Continuous RoboschoolHalfCheetah-v1
PPO Continuous RoboschoolHopper-v1 PPO Continuous RoboschoolHopper-v1
PPO Continuous RoboschoolWalker2d-v1 PPO Continuous RoboschoolWalker2d-v1
PPO Continuous BipedalWalker-v2 PPO Continuous BipedalWalker-v2
PPO Discrete CartPole-v1 PPO Discrete CartPole-v1
PPO Discrete LunarLander-v2 PPO Discrete LunarLander-v2

Dependencies

Trained and Tested on:

Python 3
PyTorch
NumPy
gym

Training Environments

Box-2d
Roboschool
pybullet

Graphs and gifs

pandas
matplotlib
Pillow

References