This repo contains source code of our work on designing efficient networks for different computer vision tasks: (1) Image classification, (2) Object detection, and (3) Semantic segmentation.
Real-time semantic segmentation using ESPNetv2 on iPhone7. See here for iOS application source code using COREML. | |
Real-time object detection using ESPNetv2 | |
Table of contents
This repo supports following networks:
Below figure compares the performance of DiCENet with other efficient networks on the ImageNet dataset. DiCENet outperforms all existing efficient networks, including MobileNetv2 and ShuffleNetv2. More details here
Below table compares the performance of our architecture with other detection networks on the MS-COCO dataset. Our network is fast and accurate. More details here
MSCOCO | ||||
Image Size | FLOPs | mIOU | FPS | |
SSD-VGG | 512x512 | 100 B | 26.8 | 19 |
YOLOv2 | 544x544 | 17.5 B | 21.6 | 40 |
ESPNetv2-SSD (Ours) | 512x512 | 3.2 B | 24.54 | 35 |
Below figure compares the performance of ESPNet and ESPNetv2 on two different datasets. Note that ESPNets are one of the first efficient networks that delivers competitive performance to existing networks on the PASCAL VOC dataset, even with low resolution images say 256x256. See here for more details.
Cityscapes | PASCAL VOC 2012 | |||||
Image Size | FLOPs | mIOU | Image Size | FLOPs | mIOU | |
ESPNet | 1024x512 | 4.5 B | 60.3 | 512x512 | 2.2 B | 63 |
ESPNetv2 | 1024x512 | 2.7 B | 66.2 | 384x384 | 0.76 B | 68 |
Details about training and testing are provided here.
Details about performance of different models are provided here.
Details about training and testing are provided here.
Details about performance of different models are provided here.
Details about training and testing are provided here.
Details about performance of different models are provided here.
To run the segmentation demo, just type:
python segmentation_demo.py
To run the detection demo, run the following command:
python detection_demo.py
OR
python detection_demo.py --live
For other supported arguments, please see the corresponding files.
If you find this repository helpful, please feel free to cite our work:
@article{mehta2019dicenet,
Author = {Sachin Mehta and Hannaneh Hajishirzi and Mohammad Rastegari},
Title = {DiCENet: Dimension-wise Convolutions for Efficient Networks},
Year = {2020},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
}
@inproceedings{mehta2018espnetv2,
title={ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network},
author={Mehta, Sachin and Rastegari, Mohammad and Shapiro, Linda and Hajishirzi, Hannaneh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2019}
}
@inproceedings{mehta2018espnet,
title={Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation},
author={Mehta, Sachin and Rastegari, Mohammad and Caspi, Anat and Shapiro, Linda and Hajishirzi, Hannaneh},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
pages={552--568},
year={2018}
}
By downloading this software, you acknowledge that you agree to the terms and conditions given here.
Most of our object detection code is adapted from SSD in pytorch. We thank authors for such an amazing work.
Thanks for your interest in our work :).
Open tasks that are interesting:
Other thoughts are also welcome :).
This repository contains DiCENet's source code in PyTorch only and you should be able to reproduce the results of v1/v2 of our arxiv paper. To reproduce the results of our T-PAMI paper, you need to incorporate MobileNet tricks in Section 5.3, which are currently not a part of this repository.