Code is fine for inference. However, the training code is just for reference and might be only used for fine-tuning. If you want to train from scratch, you need to implement the Synchronize BN layer first to do large batch-size training (as described in the paper). It seems that this repo has reproduced it, you can take a look on it.
This is an implementation of PSPNet in TensorFlow for semantic segmentation on the cityscapes dataset. We first convert weight from Original Code by using caffe-tensorflow framework.
Now you can try PSPNet on your own image online using ModelDepot live demo!
Support evaluation code for ade20k dataset
Support inference phase for ade20k dataset
using model of pspnet50 (convert weights from original author)tf.matmul
to decode label, so as to improve the speed of inference.
Support different input size
by padding input image to (720, 720) if original size is smaller than it, and get result by cropping image in the end.
Change bn layer from tf.nn.batch_normalization
into tf.layers.batch_normalization
in order to support training phase. Also update initial model in Google Drive.
Get restore checkpoint from Google Drive and put into model
directory. Note: Select the checkpoint corresponding to the dataset.
To get result on your own images, use the following command:
python inference.py --img-path=./input/test.png --dataset cityscapes
Inference time: ~0.6s
Options:
--dataset cityscapes or ade20k
--flipped-eval
--checkpoints /PATH/TO/CHECKPOINT_DIR
Perform in single-scaled model on the cityscapes validation datase.
Method | Accuracy |
---|---|
Without flip | 76.99% |
Flip | 77.23% |
Method | Accuracy |
---|---|
Without flip | 40.00% |
Flip | 40.67% |
To re-produce evluation results, do following steps:
data_dir
to your dataset path in evaluate.py
:
'data_dir': ' = /Path/to/dataset'
python evaluate.py --dataset cityscapes
List of Args:
--dataset - ade20k or cityscapes
--flipped-eval - Using flipped evaluation method
--measure-time - Calculate inference time
Input image | Output image |
---|---|
Input image | Output image |
---|---|
Input image | Output image |
---|---|
@article{zhao2017pspnet,
author = {Hengshuang Zhao and
Jianping Shi and
Xiaojuan Qi and
Xiaogang Wang and
Jiaya Jia},
title = {Pyramid Scene Parsing Network},
booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2017}
}
Scene Parsing through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso and A. Torralba. Computer Vision and Pattern Recognition (CVPR), 2017. (http://people.csail.mit.edu/bzhou/publication/scene-parse-camera-ready.pdf)
@inproceedings{zhou2017scene,
title={Scene Parsing through ADE20K Dataset},
author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2017}
}
Semantic Understanding of Scenes through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso and A. Torralba. arXiv:1608.05442. (https://arxiv.org/pdf/1608.05442.pdf)
@article{zhou2016semantic,
title={Semantic understanding of scenes through the ade20k dataset},
author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
journal={arXiv preprint arXiv:1608.05442},
year={2016}
}