twke18 / HSG

Unsupervised Hierarchical Semantic Segmentation with Multiview Cosegmentation and Clustering Transformers
https://twke18.github.io/projects/hsg.html
MIT License
70 stars 6 forks source link

How to determine the same semantic group across view given a pixel? #1

Closed linwk20 closed 2 years ago

linwk20 commented 2 years ago

Hi author! Thanks for sharing this great work!

I have a question about how to find the same semantic group across view? Can you help me by pointing out where the corresponding code is?

Thanks a lot!

twke18 commented 2 years ago

Hello,

You can see the figure 3 in our paper.

圖片

There are two separate ways to cosegment the image and its multiple views: 1) coherent region matching, 2) feature co-clustering.

  1. To conduct coherent region matching, you'll first generate coherent regions on the original image. We conduct UCM-OWT procedure to convert edges into coherent regions (row 2 in column 2). These coherent regions are then transformed in consistent with each view (row 3, 4 in column 2). The same region has the same color across views. We can easily infer which region does each pixel correspond to in every views.

  2. Feature co-clustering is conducted via our clustering transformer. We collect features across views and cluster them jointly.

Once we know the groupings for each pixel (we have 3 separate sets of groupings during training), we formulate contrastive loss correspondingly.

linwk20 commented 2 years ago

Thanks for reply! I get it now!