This is the official code release of the paper
End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
by Zhejun Zhang, Alexander Liniger, Dengxin Dai, Fisher Yu and Luc van Gool, published at ICCV 2021.
It contains the code for benchmark, off-policy data collection, on-policy data collection, RL training and IL training with DAGGER. It also contains trained models of RL experts and IL agents. The supplementary videos can be found at the paper's homepage.
The "Leaderboard" we evaluated on is an offline version of the CARLA Leaderboard. As further detailed in the paper, the offline Leaderboard has the following setup
One can use the offline Leaderboard if a thorough study on the generalization ability of the method is desired.
(+) All methods are evaluated under exactly the same condition.
(+) No need to re-evaluate other methods.
(-) No restriction on how the method is trained and how the training data is collected.
(+) Strictly prescribes both the training and testing environment.
(+) Full control and observation over the benchmark.
(-) You will have to re-evaluate other methods, if any setup of the benchmark has changed, for example CARLA version and etc.
Please refer to INSTALL.md for installation. We use AWS EC2, but you can also install and run all experiments on your computer or cluster.
Roach is an end-to-end trained agent that drives better and more naturally than hand-crafted CARLA experts. To collect a dataset from Roach, use run/data_collect_bc.sh and modify the following arguments:
save_to_wandb
: set to False
if you don't want to upload the dataset to W&B.dataset_root
: local directory for saving the dataset.test_suites
: default is eu_data
which collects data in Town01 for the NoCrash-dense benchmark. Available configurations are found here. You can also create your own configuration.n_episodes
: how many episodes to collect, each episode will be saved to a separate h5 file.agent/cilrs/obs_configs
: observation (i.e. sensor) configuration, default is central_rgb_wide
. Available configurations are found here. You can also create your own configuration.inject_noise
: default is True
. As introduced in CILRS, triangular noise is injected to steering and throttle such that the ego-vehicle does not always follow the lane center. Very useful for imitation learning.actors.hero.terminal.kwargs.max_time
: Maximum duration of an episode, in seconds.actors.hero.terminal.kwargs.no_collision
: default is True
.actors.hero.terminal.kwargs.no_run_rl
: default is False
.actors.hero.terminal.kwargs.no_run_stop
: default is False
. To benchmark checkpoints, use run/benchmark.sh
and modify the arguments to select different settings.
We recommend g4dn.xlarge
with 50 GB free disk space for video recording.
Use screen
if you want to run it in the background
screen -L -Logfile ~/screen.log -d -m run/benchmark.sh
The trained models are hosted here on W&B. Given the corresponding W&B run path, our code will automatically download and load the checkpoint with the configuration yaml file.
The following checkpoints are used to produce the results reported in our paper.
benchmark()
with agent="roaming"
.benchmark()
with agent="ppo"
and set agent.ppo.wb_run_path
to one of the following.
iccv21-roach/trained-models/1929isj0
: Roachiccv21-roach/trained-models/1ch63m76
: PPO+betaiccv21-roach/trained-models/10pscpih
: PPO+expbenchmark()
with agent="cilrs"
and set agent.cilrs.wb_run_path
to one of the following.
iccv21-roach/trained-models/39o1h862
: L_A(AP)iccv21-roach/trained-models/v5kqxe3i
: L_Aiccv21-roach/trained-models/t3x557tv
: L_Kiccv21-roach/trained-models/1w888p5d
: L_K+L_Viccv21-roach/trained-models/2tfhqohp
: L_K+L_Ficcv21-roach/trained-models/3vudxj38
: L_K+L_V+L_Ficcv21-roach/trained-models/31u9tki7
: L_K+L_F(c)iccv21-roach/trained-models/aovrm1fs
: L_K+L_V+L_F(c)iccv21-roach/trained-models/1myvm4mw
: L_A(AP)iccv21-roach/trained-models/nw226h5h
: L_Aiccv21-roach/trained-models/12uzu2lu
: L_Kiccv21-roach/trained-models/3ar2gyqw
: L_K+L_Viccv21-roach/trained-models/9rcwt5fh
: L_K+L_Ficcv21-roach/trained-models/2qq2rmr1
: L_K+L_V+L_Ficcv21-roach/trained-models/zwadqx9z
: L_K+L_F(c)iccv21-roach/trained-models/21trg553
: L_K+L_V+L_F(c)Set argument test_suites
to one of the following.
eu_test_tt
: NoCrash, busy traffic, train town & train weathereu_test_tn
: NoCrash, busy traffic, train town & new weathereu_test_nt
: NoCrash, busy traffic, new town & train weathereu_test_nn
: NoCrash, busy traffic, new town & new weathereu_test
: eu_test_tt/tn/nt/nn, all 4 conditions in one filenocrash_dense
: NoCrash, dense traffic, all 4 conditionslb_test_tt
: LeaderBoard, busy traffic, train town & train weatherlb_test_tn
: LeaderBoard, busy traffic, train town & new weatherlb_test_nt
: LeaderBoard, busy traffic, new town & train weatherlb_test_nn
: LeaderBoard, busy traffic, new town & new weatherlb_test
: lb_test_tt/tn/nt/nn all, 4 conditions in one filecc_test
: LeaderBoard, busy traffic, all 76 routes, dynamic weatherWe recommend g4dn.xlarge
for dataset collecting. Make sure you have enough disk space attached to the instance.
To collect off-policy datasets, use run/data_collect_bc.sh
and modify the arguments to select different settings.
You can use Roach (given a checkpoint) or the Autopilot to collect off-policy datasets.
In our paper, before the DAGGER training the IL agents are initialized via behavior cloning (BC) using an off-policy dataset collected in this way.
Some arguments you may want to modify:
save_to_wandb=False
if you don't want to upload the dataset to W&B.test_suites
to one of the following
eu_data
: NoCrash, train town & train weather. We collect n_episodes=80
for BC dataset on NoCrash, that is around 75 GB and 6 hours of data.lb_data
: LeaderBoard, train town & train weather. We collect n_episodes=160
for BC dataset on LeaderBoard, that is around 150 GB and 12 hours of data.cc_data
: CARLA Challenge, all six maps (Town1-6), dynamic weather. We collect n_episodes=240
for BC dataset on CARLA Challenge, that is around 150 GB and 18 hours of data.agent.ppo.wb_run_path
and agent.ppo.wb_ckpt_step
.
agent.ppo.wb_run_path
is the W&B run path where the RL training is logged and the checkpoints are saved. agent.ppo.wb_ckpt_step
is the step of the checkpoint you want to use. If it's an integer, the script will find the checkpoint closest to that step. If it's null, the latest checkpoint will be used.To collect on-policy datasets, use run/data_collect_dagger.sh
and modify the arguments to select different settings.
You can use Roach or the Autopilot to label on-policy (DAGGER) datasets generated by an IL agent (given a checkpoint).
This is done by running the data_collect.py
using an IL agent as the driver, and Roach/Autopilot as the coach.
So the expert supervisions are generated and recorded on the fly.
Most things are the same as collecting off-policy BC datasets. Here are some changes:
agent.cilrs.wb_run_path
to the W&B run path where the IL training is logged and the checkpoints are saved. n_episodes
we make sure the size of the DAGGER dataset at each iteration to be around 20% of the BC dataset size.
n_episodes
which is the half of n_episodes
of the BC dataset.n_episodes
which is the same as n_episodes
of the BC dataset.To train RL experts, use run/train_rl.sh
and modify the arguments to select different settings.
We recommend to use g4dn.4xlarge
for training the RL experts, you will need around 50 GB free disk space for videos and checkpoints.
We train RL experts on CARLA 0.9.10.1 because 0.9.11 crashes more often for unknown reasons.
To train IL agents, use run/train_il.sh
and modify the arguments to select different settings.
Training IL agents does not require CARLA and it's a GPU-heavy task. Therefore, we recommend to use AWS p-instances or your cluster to run the IL training.
Our implementation follows DA-RB (paper, repo), which trains a CILRS (paper, repo) agent using DAGGER.
The training starts with training the basic CILRS via behavior cloning using an off-policy dataset.
Then repeat the following DAGGER steps until the model achieves decent results.
For the BC training,the following arguments have to be set.
dagger_datasets
: a vector of strings, for BC training it should only contain the path (local or W&B) to the BC dataset.agent.cilrs.env_wrapper.kwargs.input_states
can be a subset of [speed,vec,cmd]
speed
: scalar ego_vehicle speedvec
: 2D vector pointing to the next GNSS waypointcmd
: one-hot vector of high-level commandagent.cilrs.policy.kwargs.number_of_branches=6
agent.cilrs.training.kwargs.branch_weights=[1.0,1.0,1.0,1.0,1.0,1.0]
agent.cilrs.policy.kwargs.number_of_branches=1
agent.cilrs.training.kwargs.branch_weights=[1.0]
agent.cilrs.env_wrapper.kwargs.action_distribution=null
agent.cilrs.training.kwargs.action_kl=false
agent.cilrs.env_wrapper.kwargs.action_distribution="beta_shared"
agent.cilrs.training.kwargs.action_kl=true
agent.cilrs.env_wrapper.kwargs.value_as_supervision=false
agent.cilrs.training.kwargs.value_weight=0.0
agent.cilrs.env_wrapper.kwargs.value_as_supervision=true
agent.cilrs.training.kwargs.value_weight=0.001
agent.cilrs.rl_run_path
and agent.cilrs.rl_ckpt_step
are used to initialize the IL agent's action/value heads with Roach's action/value head.agent.cilrs.env_wrapper.kwargs.dim_features_supervision=0
agent.cilrs.training.kwargs.features_weight=0.0
agent.cilrs.env_wrapper.kwargs.dim_features_supervision=256
agent.cilrs.training.kwargs.features_weight=0.05
During the DAGGER training, a trained IL agent will be loaded and you cannot change the configuration any more. You will have to set
agent.cilrs.wb_run_path
: the W&B run path where the previous IL training was logged and the checkpoints are saved.agent.cilrs.wb_ckpt_step
: the step of the checkpoint you want to use. Leave it as null
will load the latest checkpoint.dagger_datasets
: vector of strings, W&B run path or local path to DAGGER datasets and the BC dataset in time-reversed order, for example [PATH_DAGGER_DATA_2, PATH_DAGGER_DATA_1, PATH_DAGGER_DATA_0, BC_DATA]
train_epochs
: optionally you can change it if you want to train for more epochs.Please cite our work if you found it useful:
@inproceedings{zhang2021roach,
title = {End-to-End Urban Driving by Imitating a Reinforcement Learning Coach},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
author = {Zhang, Zhejun and Liniger, Alexander and Dai, Dengxin and Yu, Fisher and Van Gool, Luc},
year = {2021},
}
This software is released under a CC-BY-NC 4.0 license, which allows personal and research use only. For a commercial license, please contact the authors. You can view a license summary here.
Portions of source code taken from external sources are annotated with links to original files and their corresponding licenses.
This work was supported by Toyota Motor Europe and was carried out at the TRACE Lab at ETH Zurich (Toyota Research on Automated Cars in Europe - Zurich).