kaigelee / SANet

Real-Time Semantic Segmentation of Street Scenes
MIT License
29 stars 0 forks source link

Exploring Scale-Aware Features for Real-Time Semantic Segmentation of Street Scenes

License: MIT paper supplement

By Kaige Li, Qichuan Geng, Zhong Zhou. This repository is an official implementation of the paper "Exploring Scale-Aware Features for Real-Time Semantic Segmentation of Street Scenes", which is under review. The full code will be released after review.

We have open sourced the original network files, but the overall project still needs to be refactored, which is what we will do in the future.

Highlights

overview-of-our-method
Comparison of inference speed and accuracy for real-time models on test set of Cityscapes.

🔥 Updates

🎉 News

Demos

A demo of the segmentation performance of our proposed SANets: Predictions of SANet-100 (left) and SANet-50 (right).

Cityscapes
Cityscapes Stuttgart demo video #1

Cityscapes
Cityscapes Stuttgart demo video #2

Overview

overview-of-our-method
An overview of the basic architecture of our proposed Scale-Aware Network (SAFCN).

:smiley_cat: SCE and SFF blocks are responsiable for selective context encoding and feature fusion, respectively.

Metrics

:bell: We plan to embed our method into the robot designed by our research group to improve its ability to understand the scene. Therefore, we will migrate our SANet to TensorRT, and test the speed on embedded systems NVIDIA Jetson AGX Xavier.

:bell: We append 50, 75 and 100 after the network name to represent the input sizes of 512 × 1024, 768 × 1536 and 1024 × 2048, respectively.

Model (Cityscapes) Val (% mIOU) Test (% mIOU) FPS (RTX 3090) FPS (RTX 2080
Super Max-Q)
FPS (NVIDIA Jetson
AGX Xavier(32GB))
SANet-50 73.7 72.7 309.7 115.1 34.3
SANet-75 77.6 76.6 167.3 61.9 16.4
SANet-100 79.1 78.1 109.0 36.3 9.7
Model (CamVid) Val (% mIOU) Test (% mIOU) FPS (RTX 3090) FPS (RTX 2080
Super Max-Q)
FPS (NVIDIA Jetson
AGX Xavier(32GB))
SANet - 77.2 250.4 98.8 26.8

:smiley_cat: Our method can still maintain better real-time performance on RTX 2080 Super Max-Q.

Visualization

Our SANet produces higher-quality segmentation results on both large and small objects.

overview-of-our-method
Qualitative visual comparison against different methods on the Cityscapes Val set, where notably improved regions are marked with yellow dashed boxes.

Setup Environment

For this project, we used python 3.8.5. We recommend setting up a new virtual environment:

python -m venv ~/venv/sanet
source ~/venv/sanet/bin/activate

In that environment, the requirements can be installed with:

pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html

Usage

0. Prepare the dataset

1. Training

2. Evaluation

3. Speed Measurement

3.0 Latency measurement tools

3.1 Measure the speed of the SANet

4. Custom Inputs

You should end up seeing images that look like the following:

overview-of-our-method
Custom Output.

TODO

Acknowledgements

This project is based on the following open-source projects. We thank their authors for making the source code publically available.