Atari-GAIL

This repository contains the PyTorch code for Generative Adversarial Imitation Learning (GAIL) with visual inputs, i.e. Atari games and visual dm-control.

Requirements

Experiments were run with Python 3.6 and these packages:

torch == 1.10.2
gym == 0.19.0
atari-py == 0.2.9

Collect Expert Demonstrations

Train an Expert Policy with PPO

python train_ppo.py --env-name "PongNoFrameskip-v4" --algo ppo --use-gae --lr 2.5e-4  --clip-param 0.1 --value-loss-coef 0.5 --num-processes 8 --num-steps 128 --num-mini-batch 4 --log-interval 1 --use-linear-lr-decay --entropy-coef 0.01

Collect Expert Demonstrations

python collect.py --env-name "PongNoFrameskip-v4"

We provide collected expert demonstrations in the following link. 'Level 2' demonstrations are optimal demonstrations and 'Level 1' demonstrations are sub-optimal demonstrations. [Google Drive]

Train GAIL

Train GAIL with optimal demonstrations (without BC pre-training)

python gail.py --gail --env-name "PongNoFrameskip-v4" --name pong --algo ppo --use-gae --lr 2.5e-4 --clip-param 0.1 --value-loss-coef 0.5 --num-processes 8 --num-steps 128 --num-mini-batch 4 --log-interval 1 --use-linear-lr-decay --entropy-coef 0.01

Train GAIL with optimal demonstrations (with BC pre-training)

python gail.py --bc --gail --env-name "PongNoFrameskip-v4" --name pong --algo ppo --use-gae --lr 2.5e-4 --clip-param 0.1 --value-loss-coef 0.5 --num-processes 8 --num-steps 128 --num-mini-batch 4 --log-interval 1 --use-linear-lr-decay --entropy-coef 0.01

Train GAIL with imperfect demonstrations

python gail.py --imperfect --bc --gail --env-name "PongNoFrameskip-v4" --name pong --algo ppo --use-gae --lr 2.5e-4 --clip-param 0.1 --value-loss-coef 0.5 --num-processes 8 --num-steps 128 --num-mini-batch 4 --log-interval 1 --use-linear-lr-decay --entropy-coef 0.01

Results

We train GAIL with 2000 optimal demonstrations. The results are as follow.

Method	Pong	Seaquest	BeamRider	Hero	Qbert
BC	-20.7(0.46)	200.0(83.43)	1028.4(396.37)	7782.5(50.56)	11420.0(3420.0)
GAIL	-1.73(18.1)	1474.0(201.6)	1087.6(559.09)	13942.5(67.13)	8027.27(24.9)
GAIL+BC	21.0(0.0)	1662.0(161.85)	2306.4(1527.23)	20020(22.91)	13225.0(1347.22)
PPO(Best)	21.0(0.0)	1840(0.0)	2637.45(1378.23)	27814.09(46.01)	15268.18(127.07)

In our experiments, we find that using BC as a pre-training step can significantly improve the performance of GAIL in some Atari games.

Citations

If you are using the code/data in this repo, please consider citing:

   @inproceedings{wang2021learning,
     title={Learning to Weight Imperfect Demonstrations},
     author={Wang, Yunke and Xu, Chang and Du, Bo and Lee, Honglak},
     booktitle={International Conference on Machine Learning},
     pages={10961--10970},
     year={2021},
     organization={PMLR}
   }

Acknowledegement

Our code structure is largely based on Kostrikov's implementation.

yunke-wang / gail_atari

readme

Atari-GAIL

Requirements

Collect Expert Demonstrations

Train GAIL

Results

Citations

Acknowledegement