hustvl / SparseInst

[CVPR 2022] SparseInst: Sparse Instance Activation for Real-Time Instance Segmentation
MIT License
587 stars 72 forks source link

How to efficiently remove duplicating masks #8

Closed DableUTeeF closed 2 years ago

DableUTeeF commented 2 years ago

Sometime my trained model from custom dataset predict several masks of the same object.

I can easily remove them with for loop but that would be way too slow.

Is there any way to do this fast enough to be usable? I'm using DefaultPredictor along with the argument parser from test_net.py.

wondervictor commented 2 years ago

The easiest way to remove duplicate masks in SparseInst is increasing the cls_threshold by MODEL.SPARSE_INST.CLS_THRESHOLD. A higher threshold leads to fewer duplicate masks. Besides, you can reduce the num_masks to reduce the number of instances depending on your own dataset, which requires training.

DableUTeeF commented 2 years ago

That only remove low confidence objects no?

It's not that the duplicated objects always be low confidence. In fact, in some images they have higher class score than other objects nearby.

Just that there are multiple masks of the almost exact same shape on the same object.

DableUTeeF commented 2 years ago

I was thinking something like NMS suppression if it's object detection task but not sure how the model work nor how to use that in Detectron2.

wondervictor commented 2 years ago

You're right. However, SparseInst is trained with bipartite matching, which is a one-to-one assignment strategy for ground-truth objects and N predictions. During training, the assignment will force the network to lower the confidence of duplicates, which is different from previous methods using multiple positives per ground-truth object, e.g., FCOS and RetinaNet. Therefore, the duplicate predictions tend to have lower confidence scores and can be suppressed by the confidence threshold. For more theoretical details, you can refer to [1]. In addition, SparseInst contains N (N=100) objects without any spatial priors like anchors or centers, which is much more sparse than dense detectors and less prone to produce duplicate predictions. SparseInst doesn't require NMS to remove duplicate predictions and using NMS brings minor improvement.

[1] Sun et.al. What Makes for End-to-End Object Detection? ICML 2021.

wondervictor commented 2 years ago

We believe we have answered your question, and as such I'm closing this issue, but let us know if you have further questions.

DableUTeeF commented 2 years ago

Not so much of a question just to inform.

I tried increasing MODEL.SPARSE_INST.CLS_THRESHOLD and the result is just as I was expected.

Some correct masks got filtered out while some duplicate masks stays the same.

While it sounds good in theory in reality it doesn't work that way in small dataset. Class score is after all the confidence of the model telling what class the object is. The model can be highly confident an object is there but not sure what class.

The whole point of using NMS suppression is to remove duplicate masks and little else. While the paper may show some tabular result of large dataset like COCO, for an actual real problem with limited data like my case that paper is completely irrelevant.

After all, if someone want to use the model but duplicating mask is a problem, COCO result isn't gonna help.

DableUTeeF commented 2 years ago

BTW, because the your work is so good that the masks are mostly correct even on such small dataset.

I ended up solving this by just comparing the objects' center. Which is a lot faster than using the whole mask of an object but currently still need a loop and not fast enough.

DableUTeeF commented 2 years ago

But I think this question should still be open, since I won't be the last person asking this and MODEL.SPARSE_INST.CLS_THRESHOLD doesn't solve the issue.

wondervictor commented 2 years ago

Hi @DableUTeeF Could you provide some samples of your dataset? It may provide some details for me to better consider your problem. As for using CLS_THRESHOLD, we assign only one positive sample to each ground-truth object, other predictions will be regarded as negative samples. Conceptually, the bipartite matching forces the model to output only one highly confident and high-quality prediction per ground-truth object. However, it indeed will produce duplicate masks (e.g., by the criterion mask IoU > 0.5). Some duplicate masks can be eliminated by threshold while others can not be removed, which may have different categories or high confidence scores. These duplicate predictions are hard to eliminate but occur less frequently, for the COCO dataset, Besides, several methods, e.g., DETR and Sparse R-CNN, and DeFCN, indicate that NMS is not an essential procedure and has less impact on removing duplicate predictions. Using the one-to-one bipartite matching does work while it may fail under some special circumstances. If CLS_THRESHOLD can not help eliminate duplicate predictions in your work, we recommend you append the MatrixNMS in the post-processing step, which has a more efficient implementation for NMS.

wondervictor commented 2 years ago

In addition, if the number of instances in each image is far fewer than 100, you can also reduce the NUM_MASKS, which will produce fewer predictions and fewer duplicate predictions.

wondervictor commented 2 years ago

Could Matrix NMS or NUM_MASKS help remove duplicate predictions now, @DableUTeeF?