wenqing-2021 / On_Ramp_Merge_Safe_RL

we combine safe reinforcement learning with MPC to enhance the safety in the on-ramp merging scenario
BSD 2-Clause "Simplified" License
11 stars 4 forks source link

Human-aligned Safe Reinforcement Learning for Highway On-ramp Merging in Dense Traffic

This is the official implementation of the paper: [Human-aligned Safe Reinforcement Learning for Highway On-ramp Merging in Dense Traffic]. The code is based on highway-env.

1. Setup Environment

We use conda to manage our environment. To create the environment, run

conda create -n on_ramp_merge python=3.8
conda activate on_ramp_merge

and we also highly recommend you install the openmpi for the parallel training:

cd ~/Downloads
wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.2.tar.gz
tar -xzvf openmpi-4.1.2.tar.gz
cd openmpi-4.1.2
./configure
make && make install

then you need to install the requirements:

git clone https://github.com/wenqing-2021/On_Ramp_Merge_Safe_RL.git
cd On_Ramp_Merge_Safe_RL
pip install -r requirement.txt

Note: we use wandb to log the training process, so you need to create an account on wandb and login with your account. Here is the tutorial for quick_start.

2. Train agents

The environment is secondly developed based on highway-env and we implemented the Model Predictive Controller (MPC) and the Safe Reinforcement Learning (SRL) algorithms which consider the cost constraints for the on-ramp merging task. Run the following scripts for easily training:

2.1 Choose the Agent:

3. Evaluate agents

Run the following scripts to evaluate the trained agent: NOTE: the --exp_name is suggested as the format: eval_in_${density}, where the density is within the choices of low, high, mixed. After running the following scripts, the eval results will be stored in the root folder: ./eval_result/baseline/eval_in_low_Baseline_SACD_2/

python3 src/evaluate/evaluate_agents.py --exp_name eval_in_low --env merge_eval_low_density-v0 --safe_protect --data_file baseline --agents Baseline_SACD_2

3.1 Parameters:

3.2 Render the evaluation process

4. Plot tools

The plot tools have been implemented in the folder tools/. We suggest the audience to read the source code for more information. The main training results are shown in the following graphs:

train_results_a

train_results_b