mhamilton723 / STEGO

Unsupervised Semantic Segmentation by Distilling Feature Correspondences
MIT License
724 stars 147 forks source link

Flipping during the evaluation #22

Open SMSD75 opened 2 years ago

SMSD75 commented 2 years ago

Hi,

Thank you for your great research. I have noticed that when you want to produce the cluster maps, the codes of the input image and its flipped one are averaged together. How does it make sense?

mhamilton723 commented 2 years ago

Hey @SMSD75 this just provides a little bit of extra noise reduction. In some sense this evaluates the network twice, once on images with regular parity, once with a horizontal flip, and fuses the results together to reduce noise (flips the flipped-image-code back and averages them together). Intuitively this computation explicitly makes the network invariant to horizontal flips.