Closed ywyue closed 1 year ago
Hi! @ywyue Yuanwen, this is an interesting point.
"For example, in the 2D image domain, models trained on COCO can also segment satellite/medical images. In the 3D domain, models trained on ScanNet can also segment objects in outdoor scenarios, e.g., KITTI." To me, this statements are more like domain generation or cross dataset training and evaluation.
"interactive segmentation models are trained in a class-agnostic manner and can naturally generalize to data distributions beyond those seen in training", I believe this can be devided into class-agnostic segmetation part. If we miss any papers on this, you can make a PR.
However, I personally think it is not open-vocabulary setting since it miss the semantics and recognition ability. One simple extension is by adding a CLIP model to classfy each proposal. Similar work[1] is dissussed beforce.
Reference:
[1], Visual recognition by request, CVPR-2023.
Thanks for this great repo! One question: do you consider incorporating works in the interactive segmentation domain? In my opinion, interactive segmentation models are trained in a class-agnostic manner and can naturally generalize to data distributions beyond those seen in training. For example, in the 2D image domain, models trained on COCO can also segment satellite/medical images. In the 3D domain, models trained on ScanNet can also segment objects in outdoor scenarios, e.g., KITTI.