Consider interactive segmentation as open-vocabulary segmentation

jianzongwu / Awesome-Open-Vocabulary

(TPAMI 2024) A Survey on Open Vocabulary Learning

849 stars 49 forks source link

Hi! @ywyue Yuanwen, this is an interesting point.

"For example, in the 2D image domain, models trained on COCO can also segment satellite/medical images. In the 3D domain, models trained on ScanNet can also segment objects in outdoor scenarios, e.g., KITTI." To me, this statements are more like domain generation or cross dataset training and evaluation.

"interactive segmentation models are trained in a class-agnostic manner and can naturally generalize to data distributions beyond those seen in training", I believe this can be devided into class-agnostic segmetation part. If we miss any papers on this, you can make a PR.

However, I personally think it is not open-vocabulary setting since it miss the semantics and recognition ability. One simple extension is by adding a CLIP model to classfy each proposal. Similar work[1] is dissussed beforce.

Reference:

[1], Visual recognition by request, CVPR-2023.

jianzongwu / Awesome-Open-Vocabulary

Consider interactive segmentation as open-vocabulary segmentation #6