MCV-M5 : Scene Understanding for Autonomous Vehicles
This is the PreDeeptor (Team 8) repository for the M5 project. Here you can find
the source code, the documents, the deliverables and the instructions
to run the code for each week, and some references that we use for the
project.
Abstract
Convolutional Neural Networks are a hot topic at this moment. On the other hand, autonomous driving is currently a worry for the society. The current project focuses on implementation and evaluation of deep Convolutional Neural Networks in Object Recognition, Object Detection and Semantic Segmentatation on traffic images.
Contributors
We are PreDeeptor:
- Ignasi Mas (ignasi.masm@e-campus.uab.cat, Github user: MrLeylo)
- Hugo Prol (hugo.prol@e-campus.uab.cat, Github user: hprop)
- Jordi Puyoles (jordi.puyoles@e-campus.uab.cat, Github user: jordi-bird)
Documents
Development
Week 1. Project presentation
Instructions to run the code
There's no implemented code this week.
Week 2. Object recognition
Code explained
From the original repository we just worked with the config file and we added 2 models, Resnet and DenseNet.
Resnet
We followed the original paper.
DenseNet
We followed the original paper. The implementations of tdeboissiere, robertomest and titu1994 guided ours. We also added bottleneck and compression algorithms, introduced in the papers.
Achievements
- Finetune and test the VGG16 model over TT100K dataset: with cropped images.
- Finetune and test the VGG16 model over TT100K dataset: with entire images.
- Repeat those experiments training from scratch.
- Train the VGG16 model (from scratch with entire images) with transfer learning over the BTS dataset.
- Train and test the VGG16 model over Kitti dataset.
- Accelerate the previous training: downsample the images.
- Train from scratch a ResNet model over TT100K dataset.
- Finetune a ResNet model over TT100K dataset.
- Train from scratch a DenseNet model over TT100K dataset.
- Handle with the amount of parameters in DenseNet: reduce the number of layers and filters and the growth rate.
- Accelerate the previous training and its learning process: use bottleneck and compression in DenseNet and increase the learning rate.
- Perform the previous test wih dropout.
Instructions to run the code
To make a test of the experiment corresponding to the config file experimentX in the code/config folder on the repository and save the results in /home/master/folderX, if you have the datasets in /home/master/datasets_folder:
- If you don't have the repository, clone it
- Download its weights on the Weights section (file weights.hdf5 on folder experimentX) of this Readme and store the file on /home/master/folderX
- Go to mcv-m5/code and run:
python train.py -c config/experimentX.py -e ~/folderX -s /data/module5 -l ~/datasets_folder/
Weights
On the folder below, you can access to the folder which stores the weights of each model.
Mirror
Week 3 & 4. Object detection
Achievements
- Train the given YOLO network with its default configuration.
- Inspect the TT100k dataset limitations: differences in train and test sets.
- Confirm the (expected) effect of those limitations: gap between train set and the rest.
- Evaluate the train results: f-score.
- Train the Tiny-YOLO network: less time per frame (almost the half) but performance better in YOLO.
- Inspect the Udacity: differences in conditions in train and test.
- Train YOLO in Udacity dataset: high effect of the limitation above.
- Boost YOLO over TT100k dataset: preprocessing techniques (samplewise normalization, global contrast normalization).
- Boost previous training: increasing the initial learning rate but with an early decay.
- Read papers for dalternative architectures and pick: SSD.
- Implement this network, train, test and evaluate results.
Code explained
YOLO
We modified the global contrast normalization (GCN) provided in the
framework since it appears broken due to the introduction of a mask
array to handle void labels (for semantic segmentation). GCN was one
of the preprocessing stages used in our experiments with the YOLO
architecture.
Contributions were also done in the eval_detection_fscore script to
add the preprocessing stages used (samplewise center, std
normalization, GCN).
SSD
Our implementation is based on the code from
the rykov8's repository.
Beyond some modifications to adapt the input and output bounding box
formats to those used in our framework, our major contribution was to
decouple the base model from the priors declaration and the
construction of the prediction layers. Thus we are able to build easily
new SSD topologies with the build_ssd()
function (see models/ssd.py).
We plan to add in further contributions (out of assignment) a SSD
architecture with a resnet base model.
Modifications on the framework
- Global contrast normalization in code/tools/data_loader.py to be computed over all the image.
Instructions to run the code
To make a test of the experiment corresponding to the config file experimentX in the code/config folder on the repository and save the results in /home/master/folderX, if you have the datasets in /home/master/datasets_folder:
- If you don't have the repository, clone it
- Download its weights on the Weights section (file weights.hdf5 on folder experimentX) of this Readme and store the file on /home/master/folderX
- Go to mcv-m5/code and run:
python train.py -c config/experimentX.py -e ~/folderX -s /data/module5 -l ~/datasets_folder/
To evaluate the f-score of the model generated by the previous experiment:
- Go to mcv-m5/code and run:
python eval_detection_fscore.py ~/folderX/weights.hdf ~/datasets_folder
Weights
On the folder below, you can access to the folder which stores the weights of each model.
Mirror
Week 5 & 6. Object segmentation
Code explained
From the original repository we made some modifications on the framework, worked with the config files and added one model, Tiramisu.
Tiramisu
We followed the original paper. We also based our model on SimJeg's implementation. This was implemented in Lasagne, we implemented it in Keras.
To solve some missmatches we do Zero Padding after deconvolutional layers (to concatenate with the skip connections). Bottleneck and compression algorithms are implemented. We also implemented eval_dataset.py
Modifications on the framework
- Custom Cropping2D layer in layers/outlayers.py (it handles symbolic input shapes as in keras version 2).
Achievements
- Train and test the FCN8 model over Camvid dataset.
- Boost FCN8 over Camvid dataset: finetuning.
- Boost FCN8 over Camvid dataset: finetuning with data augmentation.
- Evaluate other datasets with their class distribution, image properties, dataset size or other factors. Pick one for further experiments with FCN8: Synthia.
- Boost FCN8 over Synthia dataset: finetuning.
- Read papers and select another segmentation architecture to train in Camvid: Tiramisu.
- Boost Tiramisu over Camvid dataset: finetuning with data augmentation and bilinear initialization on deconvolutional layers.
- Handle with Tiramisu high dimensionality on training data: batch size limit.
Instructions to run the code
To make a test of the experiment corresponding to the config file experimentX in the code/config folder on the repository and save the results in /home/master/folderX, if you have the datasets in /home/master/datasets_folder:
- If you don't have the repository, clone it
- Download its weights on the Weights section (file weights.hdf5 on folder experimentX) of this Readme and store the file on /home/master/folderX
- Go to mcv-m5/code and run:
python train.py -c config/experimentX.py -e ~/folderX -s /data/module5 -l ~/datasets_folder/
Weights
On the folder below, you can access to the folder which stores the weights of each model.
Mirror
References
- Simonyan, K., Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR 2014.
- He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition. CoRR 2015.
- Gao Huang, Zhuang Liu, Kilian Q. Weinberger, Laurens van der MaatenDensely Connected Convolutional Networks, 2016
- liuzhuang13, Code for Densely Connected Convolutional Networks (DenseNets)
- TSingHua-TenCent 100K dataset
- KITTI Object Detection Dataset
- KUL Belgium Traffic Signs dataset
- Udacity Dataset
- [ia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248–255. IEEE Computer Society, 2009
- Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition
- W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. SSD: Single Shot MultiBox Detector
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi.You Only Look Once: Unified, Real-Time Object Detection
- J. Redmon and A. Farhadi.YOLO9000: Better, Faster, Stronger
- Ross Girshick (Microsoft Research)Fast R-CNN
- Long, Jonathan, Evan Shelhamer, and Trevor Darrell.Fully convolutional networks for semantic segmentation
- Jégou, Simon, et al.The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation