SERNet-Former

[![[CVPR 2024 Workshops] YouTube Video](https://img.shields.io/badge/CVPRW'24-YouTube-blue)](https://youtu.be/XXzMkotcdb4?feature=shared) [![CVPR 2024 Workshop](https://img.shields.io/badge/CVPR'24-Workshop-yellow)](https://equivision.github.io/index.html#papers) [![ArXiv paper](https://img.shields.io/badge/SERNetFormer-ArXiv-red)](https://doi.org/10.48550/arXiv.2401.15741) [![CVMI 2024](https://img.shields.io/badge/CVMI-2024-blue)](https://cvmi2024.iiita.ac.in/AcceptedPapers.php)

[CVPR 2024 Workshops] SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion Networks

[CVMI 2024] SERNet-Former: Segmentation by Efficient-ResNet with Attention-Boosting Gates and Attention-Fusion Networks

Tutorials

Various implementations of SERNet-Former with different baselines for Multi-tasking is now online.

The example deploys ViT_h_14 baseline with 'Weights' 'IMAGENET1K_SWAG_E2E_V1' and simple U-Net decoder architecture.

Please also see the tutorials for

Image Segmentation based on DeepLabV3+_ResNet101 baseline

Image Classification based on ViT_h_14 baseline

News

30 July 2024 [CVMI 2024] The article "SERNet-Former: Segmentation by Efficient-ResNet with Attention-Boosting Gates and Attention-Fusion Networks" is accepted to The 3rd IEEE International Conference on Computer Vision and Machine Intelligence (IEEE CVMI)
16 May 2024 [CVPR 2024 Workshops] The article "SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion Networks" is now accepted to CVPR 2024 Workshops. Equivariant Vision: From Theory to Practice
January 2024 SERNet-Former set state-of-the-art result on Cityscapes validation dataset for pixel-level segmentation: 87.35 % mIoU
January 2024 SERNet-Former set state-of-the-art result on CamVid dataset: 84.62 % mIoU
January 2024 SERNet-Former ranked as the seventh on Cityscapes test dataset for pixel-level segmentation according to PapersWithCode.com: 84.83 % mIoU

Hall of Fame

SERNet-Former Conceptual

(a) Attention-boosting Gate (AbG) and Attention-boosting Module (AbM) are fused into the encoder part.

(b) Attention-fusion Network (AfN), introduced into the decoder

Experiment Results

CamVid Dataset

The breakdown of class accuracies on CamVid dataset

Model	Baseline Architecture	Building	Tree	Sky	Car	Sign	Road	Pedestrian	Fence	Pole	Sidewalk	Bicycle	mIoU
SERNet-Former	Efficient-ResNet	93.0	88.8	95.1	91.9	73.9	97.7	76.4	83.4	57.3	90.3	83.1	84.62

The experiment outcomes on CamVid dataset

camvid_output

Cityscapes

Model	Baseline Architecture	road	sidewalk	building	wall	fence	pole	traffic light	traffic sign	vegetation	terrain	sky	person	rider	car	truck	bus	train	motorcycle	bicycle	mIoU
SERNet-Former	Efficient-ResNet	98.2	90.2	94.0	67.6	68.2	73.6	78.2	82.1	94.6	75.9	96.9	90.0	77.7	96.9	86.1	93.9	91.7	70.0	82.9	84.83

The experiment outcomes on Cityscapes dataset

cityscapes_output

Installation Support

You can simply download this repository into your environment by running

git clone https://github.com/serdarch/SERNet-Former.git

Citations

@article{Erisen2024SERNetFormer,
  title={SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion Networks},
  author={Erişen, Serdar},
  journal={arXiv preprint arXiv:2401.15741},
  year={2024}
}

@inproceedings{Erisen2024CVPRW,
  title={SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion Networks},
  author={Erişen, Serdar},
  booktitle={CVPRW},
  year={2024},
}

serdarch / SERNet-Former

readme