SWIN instance segmentation with FCOS - how to construct config file?

open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark

https://mmdetection.readthedocs.io

Apache License 2.0

29.74k stars 9.48k forks source link

SWIN instance segmentation with FCOS - how to construct config file? #7566

Closed ChristofferEdlund closed 2 years ago

ChristofferEdlund commented 2 years ago

Hi!

I would like to build a one-stage instance segmentation model using a SWIN backbone and FCOS object detection. My own tries to get this to work has fallen flat and any input on how to construct a config file for this task would be very helpful.

Kindly, Christoffer

PeterVennerstrom commented 2 years ago

The FCOS method generates bboxes only, not instance masks.

YOLACT is an example of an instance segmentation method built on FCOS.

To add a SWIN backbone, see the SWIN configs for an example. The number of backbone outputs and their channel number need to be defined in the next submodule that receives them, typically an FPN (neck).

ChristofferEdlund commented 2 years ago

@PeterVennerstrom thank you for a quick response.

So what I am mainly after is an anchor-free detection method (such as FCOS or YOLOX) to pair with an instance segmentation task -(CenterMask is such an example). Was hoping that I could just replace the ROI generations in the SWIN config with FCOS heads, but it seems like that did not work.

As far as I know, YOLACT still uses generated anchor boxes for its detection part which FCOS does not.

PeterVennerstrom commented 2 years ago

My mistake on YOLACT. It does use anchor boxes. SipMask is an extension of YOLACT built on FCOS. BlendMask, EmbedMask and SOLO/SOLOv2 are one stage anchor free instance segmentation methods.

CenterNetV2 is a two stage method with a one stage style anchor free RPN. It could be extended like CenterMask or MaskRCNN.

SWIN can be added to any method by replacing the backbone and adjusting the neck config. Adding instance segmentation to bbox detectors is more complex and may require new code.

ChristofferEdlund commented 2 years ago

@PeterVennerstrom Was not aware of SipMask, there is just to many architectures to keep track of these days.. I will give it a go with exchanging the backbones of proposed networks with SWIN and see if it works.

Thank you Peter, much appreciated =)

ZwwWayne commented 2 years ago

See this example of Swin Transformer for RetinaNet https://github.com/open-mmlab/mmdetection/blob/master/configs/swin/retinanet_swin-t-p4-w7_fpn_1x_coco.py. You can use it for FCOS since the only difference between retinanet and FCOS is the backbone.