MCV-M5 : Scene Understanding for Autonomous Vehicles

This is the PreDeeptor (Team 8) repository for the M5 project. Here you can find the source code, the documents, the deliverables and the instructions to run the code for each week, and some references that we use for the project.

Abstract

Convolutional Neural Networks are a hot topic at this moment. On the other hand, autonomous driving is currently a worry for the society. The current project focuses on implementation and evaluation of deep Convolutional Neural Networks in Object Recognition, Object Detection and Semantic Segmentatation on traffic images.

Contributors

We are PreDeeptor:

Ignasi Mas (ignasi.masm@e-campus.uab.cat, Github user: MrLeylo)
Hugo Prol (hugo.prol@e-campus.uab.cat, Github user: hprop)
Jordi Puyoles (jordi.puyoles@e-campus.uab.cat, Github user: jordi-bird)

Documents

Development

Week 1. Project presentation

Instructions to run the code

There's no implemented code this week.

Week 2. Object recognition

Code explained

From the original repository we just worked with the config file and we added 2 models, Resnet and DenseNet.

Resnet

We followed the original paper.

DenseNet

We followed the original paper. The implementations of tdeboissiere, robertomest and titu1994 guided ours. We also added bottleneck and compression algorithms, introduced in the papers.

Achievements

Finetune and test the VGG16 model over TT100K dataset: with cropped images.
Finetune and test the VGG16 model over TT100K dataset: with entire images.
Repeat those experiments training from scratch.
Train the VGG16 model (from scratch with entire images) with transfer learning over the BTS dataset.
Train and test the VGG16 model over Kitti dataset.
Accelerate the previous training: downsample the images.
Train from scratch a ResNet model over TT100K dataset.
Finetune a ResNet model over TT100K dataset.
Train from scratch a DenseNet model over TT100K dataset.
Handle with the amount of parameters in DenseNet: reduce the number of layers and filters and the growth rate.
Accelerate the previous training and its learning process: use bottleneck and compression in DenseNet and increase the learning rate.
Perform the previous test wih dropout.

Instructions to run the code

To make a test of the experiment corresponding to the config file experimentX in the code/config folder on the repository and save the results in /home/master/folderX, if you have the datasets in /home/master/datasets_folder:

If you don't have the repository, clone it
Download its weights on the Weights section (file weights.hdf5 on folder experimentX) of this Readme and store the file on /home/master/folderX
Go to mcv-m5/code and run:

python train.py -c config/experimentX.py -e ~/folderX -s /data/module5 -l ~/datasets_folder/

Weights

On the folder below, you can access to the folder which stores the weights of each model.

Mirror

Week 3 & 4. Object detection

Achievements

Train the given YOLO network with its default configuration.
Inspect the TT100k dataset limitations: differences in train and test sets.
Confirm the (expected) effect of those limitations: gap between train set and the rest.
Evaluate the train results: f-score.
Train the Tiny-YOLO network: less time per frame (almost the half) but performance better in YOLO.
Inspect the Udacity: differences in conditions in train and test.
Train YOLO in Udacity dataset: high effect of the limitation above.
Boost YOLO over TT100k dataset: preprocessing techniques (samplewise normalization, global contrast normalization).
Boost previous training: increasing the initial learning rate but with an early decay.
Read papers for dalternative architectures and pick: SSD.
Implement this network, train, test and evaluate results.

Code explained

YOLO

We modified the global contrast normalization (GCN) provided in the framework since it appears broken due to the introduction of a mask array to handle void labels (for semantic segmentation). GCN was one of the preprocessing stages used in our experiments with the YOLO architecture.

Contributions were also done in the eval_detection_fscore script to add the preprocessing stages used (samplewise center, std normalization, GCN).

SSD

Our implementation is based on the code from the rykov8's repository.

Beyond some modifications to adapt the input and output bounding box formats to those used in our framework, our major contribution was to decouple the base model from the priors declaration and the construction of the prediction layers. Thus we are able to build easily new SSD topologies with the build_ssd() function (see models/ssd.py).

We plan to add in further contributions (out of assignment) a SSD architecture with a resnet base model.

Modifications on the framework

Global contrast normalization in code/tools/data_loader.py to be computed over all the image.

Instructions to run the code

If you don't have the repository, clone it
Download its weights on the Weights section (file weights.hdf5 on folder experimentX) of this Readme and store the file on /home/master/folderX
Go to mcv-m5/code and run:

python train.py -c config/experimentX.py -e ~/folderX -s /data/module5 -l ~/datasets_folder/

To evaluate the f-score of the model generated by the previous experiment:

Go to mcv-m5/code and run:

python eval_detection_fscore.py ~/folderX/weights.hdf ~/datasets_folder

Weights

On the folder below, you can access to the folder which stores the weights of each model.

Mirror

Week 5 & 6. Object segmentation

Code explained

From the original repository we made some modifications on the framework, worked with the config files and added one model, Tiramisu.

Tiramisu

We followed the original paper. We also based our model on SimJeg's implementation. This was implemented in Lasagne, we implemented it in Keras.

To solve some missmatches we do Zero Padding after deconvolutional layers (to concatenate with the skip connections). Bottleneck and compression algorithms are implemented. We also implemented eval_dataset.py

Modifications on the framework

Custom Cropping2D layer in layers/outlayers.py (it handles symbolic input shapes as in keras version 2).

Achievements

Train and test the FCN8 model over Camvid dataset.
Boost FCN8 over Camvid dataset: finetuning.
Boost FCN8 over Camvid dataset: finetuning with data augmentation.
Evaluate other datasets with their class distribution, image properties, dataset size or other factors. Pick one for further experiments with FCN8: Synthia.
Boost FCN8 over Synthia dataset: finetuning.
Read papers and select another segmentation architecture to train in Camvid: Tiramisu.
Boost Tiramisu over Camvid dataset: finetuning with data augmentation and bilinear initialization on deconvolutional layers.
Handle with Tiramisu high dimensionality on training data: batch size limit.

Instructions to run the code

If you don't have the repository, clone it
Download its weights on the Weights section (file weights.hdf5 on folder experimentX) of this Readme and store the file on /home/master/folderX
Go to mcv-m5/code and run:

python train.py -c config/experimentX.py -e ~/folderX -s /data/module5 -l ~/datasets_folder/

Weights

On the folder below, you can access to the folder which stores the weights of each model.

Mirror

References

Simonyan, K., Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR 2014.
He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition. CoRR 2015.
Gao Huang, Zhuang Liu, Kilian Q. Weinberger, Laurens van der MaatenDensely Connected Convolutional Networks, 2016
liuzhuang13, Code for Densely Connected Convolutional Networks (DenseNets)
TSingHua-TenCent 100K dataset
KITTI Object Detection Dataset
KUL Belgium Traffic Signs dataset
Udacity Dataset
[ia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248–255. IEEE Computer Society, 2009
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. SSD: Single Shot MultiBox Detector
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi.You Only Look Once: Unified, Real-Time Object Detection
J. Redmon and A. Farhadi.YOLO9000: Better, Faster, Stronger
Ross Girshick (Microsoft Research)Fast R-CNN
Long, Jonathan, Evan Shelhamer, and Trevor Darrell.Fully convolutional networks for semantic segmentation
Jégou, Simon, et al.The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation

hprop / mcv-m5

readme

MCV-M5 : Scene Understanding for Autonomous Vehicles

Abstract

Contributors

Documents

Development

Week 1. Project presentation

Instructions to run the code

Week 2. Object recognition

Code explained

Resnet

DenseNet

Achievements

Instructions to run the code

Weights

Week 3 & 4. Object detection

Achievements

Code explained

YOLO

SSD

Modifications on the framework

Instructions to run the code

Weights

Week 5 & 6. Object segmentation

Code explained

Tiramisu

Modifications on the framework

Achievements

Instructions to run the code

Weights

References