Closed linwk20 closed 2 years ago
Hello,
You can see the figure 3 in our paper.
There are two separate ways to cosegment the image and its multiple views: 1) coherent region matching, 2) feature co-clustering.
To conduct coherent region matching, you'll first generate coherent regions on the original image. We conduct UCM-OWT procedure to convert edges into coherent regions (row 2 in column 2). These coherent regions are then transformed in consistent with each view (row 3, 4 in column 2). The same region has the same color across views. We can easily infer which region does each pixel correspond to in every views.
Feature co-clustering is conducted via our clustering transformer. We collect features across views and cluster them jointly.
Once we know the groupings for each pixel (we have 3 separate sets of groupings during training), we formulate contrastive loss correspondingly.
Thanks for reply! I get it now!
Hi author! Thanks for sharing this great work!
I have a question about how to find the same semantic group across view? Can you help me by pointing out where the corresponding code is?
Thanks a lot!