naokishibuya / car-behavioral-cloning

Built and trained a convolutional network for end-to-end driving in a simulator using Tensorflow and Keras
MIT License
230 stars 136 forks source link
Lake Track Jungle Track
Lake Track Jungle Track
YouTube Link YouTube Link

Project Description

In this project, I use a neural network to clone car driving behavior. It is a supervised regression problem between the car steering angles and the road images in front of a car.

Those images were taken from three different camera angles (from the center, the left and the right of the car).

The network is based on The NVIDIA model, which has been proven to work in this problem domain.

As image processing is involved, the model is using convolutional layers for automated feature engineering.

Files included

Note: is originally from the Udacity Behavioral Cloning project GitHub but it has been modified to control the throttle.

Quick Start

Install required python libraries:

You need an anaconda or miniconda to use the environment setting.

# Use TensorFlow without GPU
conda env create -f environment.yml 

# Use TensorFlow with GPU
conda env create -f environment-gpu.yml

Or you can manually install the required libraries (see the contents of the environment*.yml files) using pip.

Run the pretrained model

Start up the Udacity self-driving simulator, choose a scene and press the Autonomous Mode button. Then, run the model as follows:

python model.h5

To train the model

You'll need the data folder which contains the training images.


This will generate a file model-<epoch>.h5 whenever the performance in the epoch is better than the previous best. For example, the first epoch will generate a file called model-000.h5.

Model Architecture Design

The design of the network is based on the NVIDIA model, which has been used by NVIDIA for the end-to-end self driving test. As such, it is well suited for the project.

It is a deep convolution network which works well with supervised image classification / regression problems. As the NVIDIA model is well documented, I was able to focus how to adjust the training images to produce the best result with some adjustments to the model to avoid overfitting and adding non-linearity to improve the prediction.

I've added the following adjustments to the model.

In the end, the model looks like as follows:

As per the NVIDIA model, the convolution layers are meant to handle feature engineering and the fully connected layer for predicting the steering angle. However, as stated in the NVIDIA document, it is not clear where to draw such a clear distinction. Overall, the model is very functional to clone the given steering behavior.

The below is a model structure output from the Keras which gives more details on the shapes and the number of parameters.

Layer (type) Output Shape Params Connected to
lambda_1 (Lambda) (None, 66, 200, 3) 0 lambda_input_1
convolution2d_1 (Convolution2D) (None, 31, 98, 24) 1824 lambda_1
convolution2d_2 (Convolution2D) (None, 14, 47, 36) 21636 convolution2d_1
convolution2d_3 (Convolution2D) (None, 5, 22, 48) 43248 convolution2d_2
convolution2d_4 (Convolution2D) (None, 3, 20, 64) 27712 convolution2d_3
convolution2d_5 (Convolution2D) (None, 1, 18, 64) 36928 convolution2d_4
dropout_1 (Dropout) (None, 1, 18, 64) 0 convolution2d_5
flatten_1 (Flatten) (None, 1152) 0 dropout_1
dense_1 (Dense) (None, 100) 115300 flatten_1
dense_2 (Dense) (None, 50) 5050 dense_1
dense_3 (Dense) (None, 10) 510 dense_2
dense_4 (Dense) (None, 1) 11 dense_3
Total params 252219

Data Preprocessing

Image Sizing

Model Training

Image Augumentation

For training, I used the following augumentation technique along with Python generator to generate unlimited number of images:

Using the left/right images is useful to train the recovery driving scenario. The horizontal translation is useful for difficult curve handling (i.e. the one after the bridge).

Examples of Augmented Images

The following is the example transformations:

Center Image

Center Image

Left Image

Left Image

Right Image

Right Image

Flipped Image

Flipped Image

Translated Image

Translated Image

Training, Validation and Test

I splitted the images into train and validation set in order to measure the performance at every epoch. Testing was done using the simulator.

As for training,

The Lake Side Track

As there can be unlimited number of images augmented, I set the samples per epoch to 20,000. I tried from 1 to 200 epochs but I found 5-10 epochs is good enough to produce a well trained model for the lake side track. The batch size of 40 was chosen as that is the maximum size which does not cause out of memory error on my Mac with NVIDIA GeForce GT 650M 1024 MB.

The Jungle Track

This track was later released in the new simulator by Udacity and replaced the old mountain track. It's much more difficuilt than the lake side track and the old mountain track.

I used the simulator to generate training data by doing 3 to 4 rounds. Also, added several recovery scenarios to handle tricky curves and slopes.

I felt that the validation loss is not a great indication of how well it drives. So, I tried the last several models to see which one drives the best. For this, I set the save_best_only to False (use -o false for, and I used 50 epcohs (Use -n 50).


The model can drive the course without bumping into the side ways.
