By Kaige Li, Qichuan Geng, Zhong Zhou. This repository is an official implementation of the paper "Exploring Scale-Aware Features for Real-Time Semantic Segmentation of Street Scenes", which is under review. The full code will be released after review.
We have open sourced the original network files, but the overall project still needs to be refactored, which is what we will do in the future.
Comparison of inference speed and accuracy for real-time models on test set of Cityscapes.
A demo of the segmentation performance of our proposed SANets: Predictions of SANet-100 (left) and SANet-50 (right).
Cityscapes Stuttgart demo video #1
Cityscapes Stuttgart demo video #2
An overview of the basic architecture of our proposed Scale-Aware Network (SAFCN).
:smiley_cat: SCE and SFF blocks are responsiable for selective context encoding and feature fusion, respectively.
:bell: We plan to embed our method into the robot designed by our research group to improve its ability to understand the scene. Therefore, we will migrate our SANet to TensorRT, and test the speed on embedded systems NVIDIA Jetson AGX Xavier.
:bell: We append 50, 75 and 100 after the network name to represent the input sizes of 512 × 1024, 768 × 1536 and 1024 × 2048, respectively.
Model (Cityscapes) | Val (% mIOU) | Test (% mIOU) | FPS (RTX 3090) | FPS (RTX 2080 Super Max-Q) |
FPS (NVIDIA Jetson AGX Xavier(32GB)) |
---|---|---|---|---|---|
SANet-50 | 73.7 | 72.7 | 309.7 | 115.1 | 34.3 |
SANet-75 | 77.6 | 76.6 | 167.3 | 61.9 | 16.4 |
SANet-100 | 79.1 | 78.1 | 109.0 | 36.3 | 9.7 |
Model (CamVid) | Val (% mIOU) | Test (% mIOU) | FPS (RTX 3090) | FPS (RTX 2080 Super Max-Q) |
FPS (NVIDIA Jetson AGX Xavier(32GB)) |
---|---|---|---|---|---|
SANet | - | 77.2 | 250.4 | 98.8 | 26.8 |
:smiley_cat: Our method can still maintain better real-time performance on RTX 2080 Super Max-Q.
Our SANet produces higher-quality segmentation results on both large and small objects.
Qualitative visual comparison against different methods on the Cityscapes Val set, where notably improved regions are marked with yellow dashed boxes.
For this project, we used python 3.8.5. We recommend setting up a new virtual environment:
python -m venv ~/venv/sanet
source ~/venv/sanet/bin/activate
In that environment, the requirements can be installed with:
pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html
data/cityscapes
and data/camvid
dirs.pretrained_models/imagenet/
dir.python train.py --cfg configs/SANet_cityscapes.yaml
python trainval.py --cfg configs/SANet_cityscapes_trainval.yaml
pretrained_models/cityscapes/
and pretrained_models/camvid/
dirs, respectively.python tools/eval.py --cfg configs/SANet_cityscapes.yaml \
TEST.MODEL_FILE pretrained_models/cityscapes/SANet_best_model.pt
python tools/eval.py --cfg configs/SANet_camvid.yaml \
TEST.MODEL_FILE pretrained_models/camvid/SANet_camvid_best_model.pt
python tools/submit.py --cfg configs/SANet_cityscapes_trainval.yaml \
TEST.MODEL_FILE pretrained_models/cityscapes/SANet_trainval_best_model.pt
python models/speed/sanet_speed.py --c 19 --r 1024 2048
python models/speed/sanet_speed.py --c 11 --r 720 960
samples/
and then run the command below using Cityscapes pretrained SANet for image format of .png:
python tools/custom.py --p '../pretrained_models/cityscapes/SANet_best_model.pth' --t '*.png'
You should end up seeing images that look like the following:
Custom Output.
This project is based on the following open-source projects. We thank their authors for making the source code publically available.