WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models

Official PyTorch implementation of PPT from our paper: WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models. ECCV 2024. Xinjian Wu, Ruisong Zhang, Jie Qin, Shijie Ma, Cheng-Lin Liu.

What is WPS-SAM

Segmenting and recognizing diverse object parts is crucial in computer vision and robotics. Despite significant progress in object segmentation, part-level segmentation remains underexplored due to complex boundaries and scarce annotated data. To address this, we propose a novel Weakly-supervised Part Segmentation (WPS) setting (as shown in the figure above) and an approach called WPS-SAM (as shown in the figure below), built on the large-scale pre-trained vision foundation model, Segment Anything Model (SAM). WPS-SAM is an end-to-end framework designed to extract prompt tokens directly from images and perform pixel-level segmentation of part regions. During its training phase, it only uses weakly supervised labels in the form of bounding boxes or points. Extensive experiments demonstrate that, through exploiting the rich knowledge embedded in pre-trained foundation models, WPS-SAM outperforms other segmentation models trained with pixellevel strong annotations. Specifically, WPS-SAM achieves 68.93% mIOU and 79.53% mACC on the PartImageNet dataset, surpassing state-of-theart fully supervised methods by approximately 4% in terms of mIOU.

xjwu1024 / WPS-SAM

readme

WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models

What is WPS-SAM