unity-ml-reacher

This repository contains an implementation of deep reinforcement learning based on:

Multi Agent Deep Deterministic Policy Gradients
and Multi Agent Proximal Policy Optimization

The environment to be solved is having two agents playing tennis. Each agent is conducting a racket to bounce a ball over a net. If an agent hits the ball over the net, it receives a reward of +0.1. If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01. Thus, the goal of each agent is to keep the ball in play. This environment is similar to the tennis of Unity.
The action space is continuous [-1.0, +1.0] and consists of 2 values for horizontal and jumping moves.
The environment is considered as solved if the average score of one gent is >= 0.5 for 100 consecutive episodes.
Video

A video of trained agents can be found here below

MADDPG
MAPPO
Content of this repository
analysis.xlsx: results of several experiments
report.pdf: a document that describes the implementation of the MADDPG and MAPPO, along with ideas for future work
__run_tensorboard.bat__: to run tensorboard an visualize the loss during training
folder agents: contains the implementation of
- a Multi Agent DDPG
- a Proximal Policy Optimization
- an Actor-Critic network model using tanh as activation
- A Gaussian based Actor-Critic network model using tanh as activation
- a ReplayBuffer
- Noise Generator
  - an ActionNoise that disturb the output of the actor network to promote exploration
  - a ParameterNoise that disturb the weight of the actor network to promote exploration
  - an Ornstein-Uhlenbeck noise generator
  - a simple noise generator based on numpy random generator
folder started_to_converge: weights of a network that started to converge but slowly
folder __final_weights__:
- __final_maddpg_local_2.pth__ weights of a local network trained with MADDPG that solved this environment.
- __final_maddpg_target_2.pth__ weights of a target network trained with MADDPG that solved this environment.
- final_maddpg_local.pth weights of a local network trained with MADDPG that reached 0.5 during the training but is not stable during visual validation.
- final_maddpg_target.pth weights of a target network trained with MADDPG that reached 0.5 during the training but is not stable during visual validation.
- __final_ppo.pth__ weights of the Gaussian Actor Critic Network that solved this environment with Multi Agent PPO
- __final_maddpg.png__ chart of the 1st phase of training using MADDPG
- final_maddpg_2.png chart of the 2st phase of training using MADDPG
- __final_ppo.png__ chart of the result of the training using MAPPO
Jupyter Notebooks
- Multi Agent Deep Deterministic Policy Gradient.ipynb: run this notebook to train the agents using MADDPG and to view its performance
- Multi Agent Proximal Policy Optimization.ipynb: run this notebook to train the agents using MAPPO and to view its performance

Requirements

To run the codes, follow the next steps:

Create a new environment:

Linux or Mac:

conda create --name ddpg python=3.6
source activate ddpg

Windows:

conda create --name ddpg python=3.6 
activate ddpg

Perform a minimal install of OpenAI gym
- If using Windows,
  - download swig for windows and add it the PATH of windows
  - install Microsoft Visual C++ Build Tools
- then run these commands
```
pip install gym
pip install gym[classic_control]
pip install gym[box2d]
```

Install Tensorflow and Tensorboard

pip install tensorflow, tensorflow-gpu

pip install tensorflow

Install PyTorch
```
pip install pytorch
```
Install the dependencies under the folder python/
```
cd python
pip install .
```
Install jupyter notebook
```
pip install jupyter notebook
```
Fix an issue of pytorch 0.4.1 to allow backpropagate the torch.distribution.normal function up to its standard deviation parameter
- change the line 69 of Anaconda3\envs\drlnd\Lib\site-packages\torch\distributions\utils.py
```
# old line
# tensor_idxs = [i for i in range(len(values)) if values[i].__class__.__name__ == 'Tensor']
# new line
tensor_idxs = [i for i in range(len(values)) if isinstance(values[i], torch.Tensor)]
```

Create an IPython kernel for the ddpg environment

pip install ipykernel
python -m ipykernel install --user --name ddpg --display-name "ddpg"

If cannot start any notebook, run the following command to reinstall nbconvert
```
pip3 install --upgrade --user nbconvert
```
Download the Unity Environment (thanks to Udacity) which matches your operating system
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
Start jupyter notebook from the root of this python codes
```
jupyter notebook
```
Once started, change the kernel through the menu Kernel>Change kernel>ddpg
If necessary, inside the ipynb files, change the path to the unity environment appropriately

handria-ntoanina / unity-ml-tennis

readme

unity-ml-reacher

Content of this repository

Requirements