This repository is cloned from chenjun2hao/DDRNet.pytorch and modified for research
This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data!on single 2080Ti GPU, DDRNet-23-slim yields 77.4% mIoU at 109 FPS on Cityscapes test set and 74.4% mIoU at 230 FPS on CamVid test set.
The code mainly borrows from HRNet-Semantic-Segmentation OCR and the official repository, thanks for their work.
A comparison of speed-accuracy trade-off on Cityscapes test set.
torch>=1.7.0
cudatoolkit>=10.2
Download two files below from Cityscapes.to the \${CITYSCAPES_ROOT}
Unzip them
Rename the folders like below
└── cityscapes
├── leftImg8bit
├── test
├── train
└── val
└── gtFine
├── test
├── train
└── val
Update some properties in {REPO_ROOT}/experiments/cityscapes/${MODEL_YAML} like below
DATASET:
DATASET: cityscapes
ROOT: ${CITYSCAPES_ROOT}
TEST_SET: 'cityscapes/list/test.lst'
TRAIN_SET: 'cityscapes/list/train.lst'
...
Download the pretrained model to \${MODEL_DIR}
Update MODEL.PRETRAINED
and TEST.MODEL_FILE
in {REPO_ROOT}/experiments/cityscapes/${MODEL_YAML} like below
...
MODEL:
...
PRETRAINED: "${MODEL_DIR}/${MODEL_NAME}.pth"
ALIGN_CORNERS: false
...
TEST:
...
MODEL_FILE: "${MODEL_DIR}/${MODEL_NAME}.pth"
...
cd ${REPO_ROOT}
python tools/eval.py --cfg experiments/cityscapes/ddrnet23_slim.yaml
model | OHEM | Multi-scale | Flip | mIoU | FPS | E2E Latency (s) | Link |
---|---|---|---|---|---|---|---|
DDRNet23_slim | Yes | No | No | 77.83 | 91.31 | 0.062 | official |
DDRNet23_slim | Yes | No | Yes | 78.42 | TBD | TBD | official |
DDRNet23 | Yes | No | No | 79.51 | TBD | TBD | official |
DDRNet23 | Yes | No | Yes | 79.98 | TBD | TBD | official |
mIoU denotes an mIoU on Cityscapes validation set.
FPS is measured by following the test code provided by SwiftNet. (Refer to speed_test
from lib/utils/utils.py for more details.)
E2E Latency denotes an end-to-end latency including pre/post-processing.
FPS and latency are measured with batch size 1 on RTX 2080Ti GPU and Threadripper 2950X CPU.
Note
ALIGN_CORNERS: false
in ***.yaml
will reach higher accuracy.download the imagenet pretrained model, and then train the model with 2 nvidia-3080
cd ${PROJECT}
python -m torch.distributed.launch --nproc_per_node=2 tools/train.py --cfg experiments/cityscapes/ddrnet23_slim.yaml
the own trained model coming soon
model | Train Set | Test Set | OHEM | Multi-scale | Flip | mIoU | Link |
---|---|---|---|---|---|---|---|
DDRNet23_slim | train | eval | Yes | No | Yes | 77.77 | Baidu/password:it2s |
DDRNet23_slim | train | eval | Yes | Yes | Yes | 79.57 | Baidu/password:it2s |
DDRNet23 | train | eval | Yes | No | Yes | ~ | None |
DDRNet39 | train | eval | Yes | No | Yes | ~ | None |
Note
ALIGN_CORNERS: true
in ***.yaml
, because i use the default setting in HRNet-Semantic-Segmentation OCR.align_corners=True
with better performance, the default option is False