This repository is a PyTorch implementation for semantic segmentation / scene parsing. The code is easy to use for training and testing on various datasets. The codebase mainly uses ResNet50/101/152 as backbone and can be easily adapted to other basic classification structures. Implemented networks including PSPNet and PSANet, which ranked 1st places in ImageNet Scene Parsing Challenge 2016 @ECCV16, LSUN Semantic Segmentation Challenge 2017 @CVPR17 and WAD Drivable Area Segmentation Challenge 2018 @CVPR18. Sample experimented datasets are ADE20K, PASCAL VOC 2012 and Cityscapes.
master
, use official nn.SyncBatchNorm, only multiprocessing training is supported, tested with pytorch 1.4.0.1.0.0
, both multithreading training (nn.DataParallel) and multiprocessing training (nn.parallel.DistributedDataParallel) (recommended) are supported. And the later one is much faster. Use syncbn
from EncNet and apex, tested with pytorch 1.0.0.Highlight:
Requirement:
Clone the repository:
git clone https://github.com/hszhao/semseg.git
Train:
Download related datasets and symlink the paths to them as follows (you can alternatively modify the relevant paths specified in folder config
):
cd semseg
mkdir -p dataset
ln -s /path_to_ade20k_dataset dataset/ade20k
Download ImageNet pre-trained models and put them under folder initmodel
for weight initialization. Remember to use the right dataset format detailed in FAQ.md.
Specify the gpu used in config then do training:
sh tool/train.sh ade20k pspnet50
If you are using SLURM for nodes manager, uncomment lines in train.sh and then do training:
sbatch tool/train.sh ade20k pspnet50
Test:
Download trained segmentation models and put them under folder specified in config or modify the specified paths.
For full testing (get listed performance):
sh tool/test.sh ade20k pspnet50
Quick demo on one image:
PYTHONPATH=./ python tool/demo.py --config=config/ade20k/ade20k_pspnet50.yaml --image=figure/demo/ADE_val_00001515.jpg TEST.scales '[1.0]'
Visualization: tensorboardX incorporated for better visualization.
tensorboard --logdir=exp/ade20k
Other:
names
and colors
) are in folder dataset
and some sample lists can be accessed.Description: mIoU/mAcc/aAcc stands for mean IoU, mean accuracy of each class and all pixel accuracy respectively. ss denotes single scale testing and ms indicates multi-scale testing. Training time is measured on a sever with 8 GeForce RTX 2080 Ti. General parameters cross different datasets are listed below:
ADE20K: Train Parameters: classes(150), train_h(473/465-PSP/A), train_w(473/465-PSP/A), epochs(100). Test Parameters: classes(150), test_h(473/465-PSP/A), test_w(473/465-PSP/A), base_size(512).
Network | mIoU/mAcc/aAcc(ss) | mIoU/mAcc/pAcc(ms) | Training Time |
---|---|---|---|
PSPNet50 | 0.4189/0.5227/0.8039. | 0.4284/0.5266/0.8106. | 14h |
PSANet50 | 0.4229/0.5307/0.8032. | 0.4305/0.5312/0.8101. | 14h |
PSPNet101 | 0.4310/0.5375/0.8107. | 0.4415/0.5426/0.8172. | 20h |
PSANet101 | 0.4337/0.5385/0.8102. | 0.4414/0.5392/0.8170. | 20h |
PSACAL VOC 2012: Train Parameters: classes(21), train_h(473/465-PSP/A), train_w(473/465-PSP/A), epochs(50). Test Parameters: classes(21), test_h(473/465-PSP/A), test_w(473/465-PSP/A), base_size(512).
Network | mIoU/mAcc/aAcc(ss) | mIoU/mAcc/pAcc(ms) | Training Time |
---|---|---|---|
PSPNet50 | 0.7705/0.8513/0.9489. | 0.7802/0.8580/0.9513. | 3.3h |
PSANet50 | 0.7725/0.8569/0.9491. | 0.7787/0.8606/0.9508. | 3.3h |
PSPNet101 | 0.7907/0.8636/0.9534. | 0.7963/0.8677/0.9550. | 5h |
PSANet101 | 0.7870/0.8642/0.9528. | 0.7966/0.8696/0.9549. | 5h |
Cityscapes: Train Parameters: classes(19), train_h(713/709-PSP/A), train_w(713/709-PSP/A), epochs(200). Test Parameters: classes(19), test_h(713/709-PSP/A), test_w(713/709-PSP/A), base_size(2048).
Network | mIoU/mAcc/aAcc(ss) | mIoU/mAcc/pAcc(ms) | Training Time |
---|---|---|---|
PSPNet50 | 0.7730/0.8431/0.9597. | 0.7838/0.8486/0.9617. | 7h |
PSANet50 | 0.7745/0.8461/0.9600. | 0.7818/0.8487/0.9622. | 7.5h |
PSPNet101 | 0.7863/0.8577/0.9614. | 0.7929/0.8591/0.9638. | 10h |
PSANet101 | 0.7842/0.8599/0.9621. | 0.7940/0.8631/0.9644. | 10.5h |
If you find the code or trained models useful, please consider citing:
@misc{semseg2019,
author={Zhao, Hengshuang},
title={semseg},
howpublished={\url{https://github.com/hszhao/semseg}},
year={2019}
}
@inproceedings{zhao2017pspnet,
title={Pyramid Scene Parsing Network},
author={Zhao, Hengshuang and Shi, Jianping and Qi, Xiaojuan and Wang, Xiaogang and Jia, Jiaya},
booktitle={CVPR},
year={2017}
}
@inproceedings{zhao2018psanet,
title={{PSANet}: Point-wise Spatial Attention Network for Scene Parsing},
author={Zhao, Hengshuang and Zhang, Yi and Liu, Shu and Shi, Jianping and Loy, Chen Change and Lin, Dahua and Jia, Jiaya},
booktitle={ECCV},
year={2018}
}
Some FAQ.md collected. You are welcome to send pull requests or give some advices. Contact information: hengshuangzhao at gmail.com
.