How far should we step? Policy Improvement with Trust Region Optimisation

Students: Lucas Fijen, Pieter de Marez Oyens, Jonathan Mitnik, Guido Visser.

Experiment

To read more about our experiment in comparing TRPO with NPG, read our blog entry here.

Setup

This codebase requires Python 3.6 (or higher). We recommend using Anaconda or Miniconda for setting up the virtual environment. Here's a walk through for the installation and project setup.

git clone https://github.com/lucasfijen/ReinforcementLearning
cd srcs
conda create -n rl_final_report python=3.6
conda activate rl_final_report
pip install -r requirements.txt

Assuming you are in src/, you can then run the environment using python agent.py. For information regarding the various arguments, see arguments.py, and also the code repository Deep Bayesian Quadrature Policy Optimization for additional information.

Credits

Our codebase is heavily based on Akella17's implementation of TRPO and NPG, from their code repository Deep Bayesian Quadrature Policy Optimization,

lucasfijen / How-far-should-we-step-Policy-Improvement-with-Trust-Region-Optimisation

readme

How far should we step? Policy Improvement with Trust Region Optimisation

Experiment

Setup

Credits