hustvl / SparseInst

[CVPR 2022] SparseInst: Sparse Instance Activation for Real-Time Instance Segmentation
MIT License
594 stars 71 forks source link
detectron2 instance-segmentation object-detection panoptic-segmentation real-time


Tianheng Cheng, Xinggang Wang, Shaoyu Chen, Wenqiang Zhang, Qian Zhang, Chang Huang, Zhaoxiang Zhang, Wenyu Liu
(: corresponding author)
[arXiv paper] [CVPR paper] [slides]

Highlights



[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sparse-instance-activation-for-real-time/real-time-instance-segmentation-on-mscoco)](https://paperswithcode.com/sota/real-time-instance-segmentation-on-mscoco?p=sparse-instance-activation-for-real-time)

Updates

This project is under active development, please stay tuned!

Overview

SparseInst is a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation. In contrast to region boxes or anchors (centers), SparseInst adopts a sparse set of instance activation maps as object representation, to highlight informative regions for each foreground objects. Then it obtains the instance-level features by aggregating features according to the highlighted regions for recognition and segmentation. The bipartite matching compels the instance activation maps to predict objects in a one-to-one style, thus avoiding non-maximum suppression (NMS) in post-processing. Owing to the simple yet effective designs with instance activation maps, SparseInst has extremely fast inference speed and achieves 40 FPS and 37.9 AP on COCO (NVIDIA 2080Ti), significantly outperforms the counter parts in terms of speed and accuracy.

Models

We provide two versions of SparseInst, i.e., the basic IAM (3x3 convolution) and the Group IAM (G-IAM for short), with different backbones. All models are trained on MS-COCO train2017.

Fast models

model backbone input aug APval AP FPS weights
SparseInst R-50 640 32.8 33.2 44.3 model
SparseInst R-50-vd 640 34.1 34.5 42.6 model
SparseInst (G-IAM) R-50 608 33.4 34.0 44.6 model
SparseInst (G-IAM, Softmax) R-50 608 33.6 - 44.6 model
SparseInst (G-IAM) R-50 608 34.2 34.7 44.6 model
SparseInst (G-IAM) R-50-DCN 608 36.4 36.8 41.6 model
SparseInst (G-IAM) R-50-vd 608 35.6 36.1 42.8 model
SparseInst (G-IAM) R-50-vd-DCN 608 37.4 37.9 40.0 model
SparseInst (G-IAM) R-50-vd-DCN 640 37.7 38.1 39.3 model

SparseInst with other backbones

model backbone input APval AP FPS weights
SparseInst (G-IAM) CSPDarkNet 640 35.1 - - model

Larger models

model backbone input aug APval AP FPS weights
SparseInst (G-IAM) R-101 640 34.9 35.5 - model
SparseInst (G-IAM) R-101-DCN 640 36.4 36.9 - model

SparseInst with Vision Transformers

model backbone input aug APval AP FPS weights
SparseInst (G-IAM) PVTv2-B1 640 35.3 36.0 33.5 (48.9) model
SparseInst (G-IAM) PVTv2-B2-li 640 37.2 38.2 26.5 model

: measured on RTX 3090.

Note:

Installation and Prerequisites

This project is built upon the excellent framework detectron2, and you should install detectron2 first, please check official installation guide for more details.

Updates: SparseInst works well on detectron2-v0.6.

Note: previously, we mainly use v0.3 of detectron2 for experiments and evaluations. Besides, we also test our code on the newest version v0.6. If you find some bugs or incompatibility problems of higher version of detectron2, please feel free to raise a issue!

Install the detectron2:

git clone https://github.com/facebookresearch/detectron2.git
# if you swith to a specific version, e.g., v0.3 (recommended) or v0.6
git checkout tags/v0.6
# build detectron2
python setup.py build develop

Getting Start

🔥 SparseInst with FP16

SparseInst with FP16 achieves 30% faster inference speed and saves much training memory, we provide some comparisons about the memory, inference speed, and training speed in the below table.

FP16 train mem.(log) train mem.(nvidia-smi) train speed infer. speed
6.0G 10.5G 0.8690s/iter 52.17 FPS
3.9G 6.8G 0.6949s/iter 67.57 FPS

Note: statistics are measured on NVIDIA 3090. With FP16, we have faster training speed and can also increase the batch size for better performance.

python tools/train_net.py --config-file configs/sparse_inst_r50_giam_fp16.yaml --num-gpus 8 SOLVER.AMP.ENABLED True
python tools/test_net.py --config-file configs/sparse_inst_r50_giam_fp16.yaml --fp16 MODEL.WEIGHTS model_final.pth 

Testing SparseInst

Before testing, you should specify the config file <CONFIG> and the model weights <MODEL-PATH>. In addition, you can change the input size by setting the INPUT.MIN_SIZE_TEST in both config file or commandline.

python tools/train_net.py --config-file <CONFIG> --num-gpus <GPUS> --eval MODEL.WEIGHTS <MODEL-PATH>
# example:
python tools/train_net.py --config-file configs/sparse_inst_r50_giam.yaml --num-gpus 8 --eval MODEL.WEIGHTS sparse_inst_r50_giam_aug_2b7d68.pth
python tools/test_net.py --config-file <CONFIG> MODEL.WEIGHTS <MODEL-PATH> INPUT.MIN_SIZE_TEST 512
# example:
python tools/test_net.py --config-file configs/sparse_inst_r50_giam.yaml MODEL.WEIGHTS sparse_inst_r50_giam_aug_2b7d68.pth INPUT.MIN_SIZE_TEST 512

Note:

FLOPs and Parameters

The get_flops.py is built based on detectron2 and fvcore.

python tools/get_flops.py --config-file <CONFIG> --tasks parameter flop

Visualizing Images with SparseInst

To inference or visualize the segmentation results on your images, you can run:

python demo.py --config-file <CONFIG> --input <IMAGE-PATH> --output results --opts MODEL.WEIGHTS <MODEL-PATH>
# example
python demo.py --config-file configs/sparse_inst_r50_giam.yaml --input datasets/coco/val2017/* --output results --opt MODEL.WEIGHTS sparse_inst_r50_giam_aug_2b7d68.pth INPUT.MIN_SIZE_TEST 512

Visualization results (SparseInst-R50-GIAM)

Training SparseInst

To train the SparseInst model on COCO dataset with 8 GPUs. 8 GPUs are required for the training. If you only have 4 GPUs or GPU memory is limited, it doesn't matter and you can reduce the batch size through SOLVER.IMS_PER_BATCH or reduce the input size. If you adjust the batch size, learning schedule should be adjusted according to the linear scaling rule.

python tools/train_net.py --config-file <CONFIG> --num-gpus 8 
# example
python tools/train_net.py --config-file configs/sparse_inst_r50vd_dcn_giam_aug.yaml --num-gpus 8

Custom Training of SparseInst

  1. We suggest you convert your custom datasets into the COCO format, which enables the usage of the default dataset mappers and loaders. You may find more details in the official guide of detectron2.
  2. You need to check whether NUM_CLASSES and NUM_MASKS should be changed according to your scenarios or tasks.
  3. Change the configurations accordingly.
  4. After finishing the above procedures, you can easily train SparseInst by train_net.py.

Acknowledgements

SparseInst is based on detectron2, OneNet, DETR, and timm, and we sincerely thanks for their code and contribution to the community!

Citing SparseInst

If you find SparseInst is useful in your research or applications, please consider giving us a star 🌟 and citing SparseInst by the following BibTeX entry.

@inproceedings{Cheng2022SparseInst,
  title     =   {Sparse Instance Activation for Real-Time Instance Segmentation},
  author    =   {Cheng, Tianheng and Wang, Xinggang and Chen, Shaoyu and Zhang, Wenqiang and Zhang, Qian and Huang, Chang and Zhang, Zhaoxiang and Liu, Wenyu},
  booktitle =   {Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR)},
  year      =   {2022}
}

License

SparseInst is released under the MIT Licence.