This repository is an official PyTorch implementation of paper:
GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models.
Chen Liang, Wenguan Wang, Jiaxu Miao, Yi Yang
NeurIPS 2022 (Spotlight). (arXiv 2210.02025)
Prevalent semantic segmentation solutions are, in essence, a dense discriminative classifier of p(class|pixel feature). Though straightforward, this de facto paradigm neglects the underlying data distribution p(pixel feature|class), and struggles to identify out-of-distribution data. Going beyond this, we propose GMMSeg, a new family of segmentation models that rely on a dense generative classifier for the joint distribution p(pixel feature,class). For each class, GMMSeg builds Gaussian Mixture Models (GMMs) via Expectation-Maximization (EM), so as to capture class-conditional densities. Meanwhile, the deep dense representation is end-to-end trained in a discriminative manner, i.e., maximizing p(class|pixel feature). This endows GMMSeg with the strengths of both generative and discriminative models. With a variety of segmentation architectures and backbones, GMMSeg outperforms the discriminative counterparts on three closed-set datasets. More impressively, without any modification, GMMSeg even performs well on open-world datasets. We believe this work brings fundamental insights into the related fields.
This implementation is built on MMSegmentation v0.22.1. Many thanks to the contributors for their great efforts.
Please follow the get_started for installation and dataset_prepare for dataset preparation.
Other requirements: pip install timm==0.5.4 einops==0.4.1
Backbone | Model | Train Set | Val Set | Iterations | Batch Size | mIoU | Log | CKPT | Config |
---|---|---|---|---|---|---|---|---|---|
MiT-B5 | GMMSeg-Segformer | coco-stuff10k-train | coco-stuff10k-test | 80000 | 8xbs2 | 44.81 | log | ckpt | cfg |
# single-gpu train
python tools/train.py configs/_gmmseg/segformer_mit-b5_gmmseg_512x512_80k_cocostuff10k.py
# multi-gpu train
bash ./tools/dist_train.sh configs/_gmmseg/segformer_mit-b5_gmmseg_512x512_80k_cocostuff10k.py ${GPU_NUM}
# single-gpu test
python tools/test.py configs/_gmmseg/segformer_mit-b5_gmmseg_512x512_80k_cocostuff10k.py /path/to/checkpoint_file
# multi-gpu test
bash ./tools/dist_test.sh configs/_gmmseg/segformer_mit-b5_gmmseg_512x512_80k_cocostuff10k.py /path/to/checkpoint_file ${GPU_NUM}
Note: We recommend training with eight Tesla A100 GPUs, i.e., GPU_NUM=8
.
Please also see train and inference for the detailed usage of MMSegmentation.
May also see a series of our related works in visual recognition:
[1] Exploring Cross-Image Pixel Contrast for Semantic Segmentation - ICCV 2021 (Oral) [arXiv][code]
[2] Rethinking Semantic Segmentation: A Prototype View - CVPR 2022 (Oral) [arXiv][code]
[3] Deep Hierarchical Semantic Segmentation - CVPR 2022 [arXiv][code]
[4] Visual Recognition with Deep Nearest Centroids - arXiv 2022 [arXiv][code]
If you find GMMSeg useful or inspiring, please consider citing our paper:
@inproceedings{liang2022gmmseg,
title = {GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models},
author = {Liang, Chen and Wang, Wenguan and Miao, Jiaxu and Yang, Yi},
booktitle = {Advances in Neural Information Processing Systems},
year = {2022}
}
This repository is currently maintained by Chen Liang.