lcswillems / rl-starter-files

RL starter files in order to immediately train, visualize and evaluate an agent without writing any line of code
MIT License
649 stars 183 forks source link
a2c a3c minigrid multi-process ppo preprocessed-observations pytorch reward-shaping

RL Starter Files

RL starter files in order to immediatly train, visualize and evaluate an agent without writing any line of code.

These files are suited for minigrid environments and torch-ac RL algorithms. They are easy to adapt to other environments and RL algorithms.

Features

Installation

  1. Clone this repository.

  2. Install minigrid environments and torch-ac RL algorithms:

pip3 install -r requirements.txt

Note: If you want to modify torch-ac algorithms, you will need to rather install a cloned version, i.e.:

git clone https://github.com/lcswillems/torch-ac.git
cd torch-ac
pip3 install -e .

Example of use

Train, visualize and evaluate an agent on the MiniGrid-DoorKey-5x5-v0 environment:

  1. Train the agent on the MiniGrid-DoorKey-5x5-v0 environment with PPO algorithm:
python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000

  1. Visualize agent's behavior:
python3 -m scripts.visualize --env MiniGrid-DoorKey-5x5-v0 --model DoorKey

  1. Evaluate agent's performance:
python3 -m scripts.evaluate --env MiniGrid-DoorKey-5x5-v0 --model DoorKey

Note: More details on the commands are given below.

Other examples

Handle textual instructions

In the GoToDoor environment, the agent receives an image along with a textual instruction. To handle the latter, add --text to the command:

python3 -m scripts.train --algo ppo --env MiniGrid-GoToDoor-5x5-v0 --model GoToDoor --text --save-interval 10 --frames 1000000

Add memory

In the RedBlueDoors environment, the agent has to open the red door then the blue one. To solve it efficiently, when it opens the red door, it has to remember it. To add memory to the agent, add --recurrence X to the command:

python3 -m scripts.train --algo ppo --env MiniGrid-RedBlueDoors-6x6-v0 --model RedBlueDoors --recurrence 4 --save-interval 10 --frames 1000000

Files

This package contains:

These files are suited for minigrid environments and torch-ac RL algorithms. They are easy to adapt to other environments and RL algorithms by modifying:

scripts/train.py

An example of use:

python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000

The script loads the model in storage/DoorKey or creates it if it doesn't exist, then trains it with the PPO algorithm on the MiniGrid DoorKey environment, and saves it every 10 updates in storage/DoorKey. It stops after 80 000 frames.

Note: You can define a different storage location in the environment variable PROJECT_STORAGE.

More generally, the script has 2 required arguments:

and a bunch of optional arguments among which:

During training, logs are printed in your terminal (and saved in text and CSV format):

Note: U gives the update number, F the total number of frames, FPS the number of frames per second, D the total duration, rR:μσmM the mean, std, min and max reshaped return per episode, F:μσmM the mean, std, min and max number of frames per episode, H the entropy, V the value, pL the policy loss, vL the value loss and the gradient norm.

During training, logs are also plotted in Tensorboard:

scripts/visualize.py

An example of use:

python3 -m scripts.visualize --env MiniGrid-DoorKey-5x5-v0 --model DoorKey

In this use case, the script displays how the model in storage/DoorKey behaves on the MiniGrid DoorKey environment.

More generally, the script has 2 required arguments:

and a bunch of optional arguments among which:

scripts/evaluate.py

An example of use:

python3 -m scripts.evaluate --env MiniGrid-DoorKey-5x5-v0 --model DoorKey

In this use case, the script prints in the terminal the performance among 100 episodes of the model in storage/DoorKey.

More generally, the script has 2 required arguments:

and a bunch of optional arguments among which:

model.py

The default model is discribed by the following schema:

By default, the memory part (in red) and the langage part (in blue) are disabled. They can be enabled by setting to True the use_memory and use_text parameters of the model constructor.

This model can be easily adapted to your needs.