manish-pra / copg

This repository contains all code and experiments for competitive policy gradient (CoPG) algorithm.
MIT License
23 stars 8 forks source link

Competitive Policy Gradient (CoPG) Algorithm

This repository contains all code and experiments for competitive policy gradient (CoPG) algorithm. The paper for competitive policy gradient can be found here, The code for Trust Region Competitive Policy Optimization (TRCPO) algorithm can be found here.

Experiment videos are available here

Dependencies

  1. Code is tested on python 3.5.2.
  2. Only Markov Soccer experiment requires OpenSpiel library, Other 5 experiments can be run directly.
  3. Require torch.utils.tensorboard

Repository structure

.
├── notebooks
│   ├── LQ_game.ipynb
│   ├── bilinear_game.ipynb
│   ├── RockPaperScissors.ipynb
│   ├── matching_pennies.ipynb
│   ├── MarkovSoccer.ipynb
│   ├── CarRacing.ipynb
├── game                            # Each game have a saparate folder with this structure
│   ├── game.py                     
│   ├── copg_game.py                
│   ├── gda_game.py
│   ├── network.py
│   ├── pretrained_models.py       (if applicable)
│   ├── results.py                 (if applicable)
├── copg_optim
│   ├── copg.py 
│   ├── critic_functions.py 
│   ├── utils.py 
├── car_racing_simulator
└── ...
  1. Jupyter notebooks are the best point to start. It contains demonstrations and results.
  2. Folder copg_optim contains optimization code

How to start ?

Open jupyter notebook and run it to see results.

or

git clone "adress"
cd copg
cd RockPaperScissors
python3 copg_rps.py
cd ..
cd tensorboard
tensordboard --logdir .

You can check results in the tensorboard.

Experiment Demonstration

                                 GDA vs GDA                                                      CoPG vs CoPG

ORCA Car Racing

                                 

Rock Paper Scissors

               

Markov Soccer

                             

Matching Pennies