PyTorch implementation of over 30 realtime semantic segmentations models, e.g. BiSeNetv1, BiSeNetv2, CGNet, ContextNet, DABNet, DDRNet, EDANet, ENet, ERFNet, ESPNet, ESPNetv2, FastSCNN, ICNet, LEDNet, LinkNet, PP-LiteSeg, SegNet, ShelfNet, STDC, SwiftNet, and support knowledge distillation, distributed training etc.
PyTorch implementation of realtime semantic segmentation models, support multi-gpu training and validating, automatic mixed precision training, knowledge distillation etc.


torch == 1.8.1

Supported models

If you want to use encoder-decoder structure with pretrained encoders, you may refer to: segmentation-models-pytorch^smp. This repo also provides easy access to SMP. Just modify the config file to (e.g. if you want to train DeepLabv3Plus with ResNet-101 backbone as teacher model to perform knowledge distillation)

self.model = 'smp'
self.encoder = 'resnet101'
self.decoder = 'deeplabv3p'

or use command-line arguments

python main.py --model smp --encoder resnet101 --decoder deeplabv3p

Details of the configurations can also be found in this file.

Knowledge Distillation

Currently only support the original knowledge distillation method proposed by Geoffrey Hinton.[^kd]

How to use

DDP training (recommend)

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 main.py

DP training

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py

Performances and checkpoints

full resolution on Cityscapes

Model Year Encoder Params(M)
FPS1 mIoU(paper)
mIoU(my) val2
ADSCNet 2019 None n.a./0.51 89 n.a./67.5 69.06
AGLNet 2020 None 1.12/1.02 61 69.39/70.1 73.58
BiSeNetv1 2018 ResNet18 49.0/13.32 88 74.8/74.7 74.91
BiSeNetv2 2020 None n.a./2.27 142 73.4/72.6 73.733
CANet 2019 MobileNetv2 4.8/4.77 76 73.4/73.5 76.59
CFPNet 2021 None 0.55/0.27 64 n.a./70.1 70.08
CGNet 2018 None 0.41/0.24 157 59.7/64.84 67.25
ContextNet 2018 None 0.85/1.01 80 65.9/66.1 66.61
DABNet 2019 None 0.76/0.75 140 n.a./70.1 70.78
DDRNet 2021 None 5.7/5.54 233 77.8/77.4 74.34
DFANet 2019 XceptionA 7.8/3.05 60 71.9/71.3 65.28
EDANet 2018 None 0.68/0.69 125 n.a./67.3 70.76
ENet 2016 None 0.37/0.37 140 n.a./58.3 71.31
ERFNet 2017 None 2.06/2.07 60 70.0/68.0 76.00
ESNet 2019 None 1.66/1.66 66 n.a./70.7 71.82
ESPNet 2018 None 0.36/0.38 111 n.a./60.3 66.39
ESPNetv2 2018 None 1.25/0.86 101 66.4/66.2 70.35
FANet 2020 ResNet18 n.a./12.26 100 75.0/74.4 74.92
FarseeNet 2020 ResNet18 n.a./16.75 130 73.5/70.2 77.35
FastSCNN 2019 None 1.11/1.02 358 68.6/68.0 69.37
FDDWNet 2019 None 0.80/0.77 51 n.a./71.5 75.86
FPENet 2019 None 0.38/0.36 90 n.a./70.1 72.05
FSSNet 2018 None 0.2/0.20 121 n.a./58.8 65.44
ICNet 2017 ResNet18 26.55/12.42 102 67.75/69.55 69.65
LEDNet 2019 None 0.94/1.46 76 n.a./70.6 72.63
LinkNet 2017 ResNet18 11.5/11.54 106 n.a./76.4 73.39
Lite-HRNet 2021 None 1.1/1.09 30 73.8/72.8 70.66
LiteSeg 2019 MobileNetv2 4.38/4.29 117 70.0/67.8 76.10
MiniNet 2019 None 3.1/1.41 254 n.a./40.7 61.47
MiniNetv2 2020 None 0.5/0.51 86 n.a./70.5 71.79
PP-LiteSeg 2022 STDC1 n.a./6.33 201 76.0/74.9 72.49
PP-LiteSeg 2022 STDC2 n.a./10.56 136 78.2/77.5 74.37
RegSeg 2021 None 3.34/3.37 104 78.5/78.3 74.28
SegNet 2015 None 29.46/29.48 14 n.a./56.1 70.77
ShelfNet 2018 ResNet18 23.5/16.04 110 n.a./74.8 77.63
SQNet 2016 SqueezeNet-1.1 n.a./4.81 69 n.a./59.8 69.55
STDC 2021 STDC1 n.a./7.79 163 74.5/75.3 75.256
STDC 2021 STDC2 n.a./11.82 119 77.0/76.8 76.786
SwiftNet 2019 ResNet18 11.8/11.95 141 75.4/75.5 75.43

[1FPSs are evaluated on RTX 2080 at resolution 1024x512 using this script. Please note that FPSs vary between devices and hardwares and also depend on other factors (e.g. whether to use cudnn or not). To obtain accurate FPSs, please test them on your device accordingly.]
[2These results are obtained by training 800 epochs with crop-size 1024x1024]
[3These results are obtained by using auxiliary heads]
[4This result is obtained by using deeper model, i.e. CGNet_M3N21]
[5The original encoder of ICNet is ResNet50]
[6In my experiments, detail loss does not improve the performances. However, using auxiliary heads does contribute to the improvements]

SMP performance on Cityscapes

Decoder Params (M) mIoU (200 epoch) mIoU (800 epoch)
DeepLabv3 15.90 75.22 77.16
DeepLabv3Plus 12.33 73.97 75.90
FPN 13.05 73.44 74.94
LinkNet 11.66 71.17 73.19
MANet 21.68 74.59 76.14
PAN 11.37 70.25 72.46
PSPNet 11.41 61.63 67.26
UNet 14.33 72.99 74.45
UNetPlusPlus 15.97 74.31 75.57

[For comparison, the above results are all using ResNet-18 as encoders.]

Knowledge distillation

Model Encoder Decoder kd_training mIoU(200 epoch) mIoU(800 epoch)
SMP DeepLabv3Plus ResNet-101
- 78.10 79.20
SMP DeepLabv3Plus ResNet-18
False 73.97 75.90
SMP DeepLabv3Plus ResNet-18
True 75.20 76.41

Prepare the dataset

