sourachakra / SCoSPARC

This repository contains code for the paper "Self-supervised co-salient object detection via feature correspondence at multiple scales"
0 stars 0 forks source link

foreground image mask F = Mn ⊗ I, background image mask B = (1−Mn)⊗In, #1

Open zhending111 opened 5 months ago

zhending111 commented 5 months ago

Good job!
I is the image or the vit out feature of the image?Where can i get the supplementary?

sourachakra commented 5 months ago

Good job! I is the image or the vit out feature of the image?Where can i get the supplementary?

Hello, thank you for bringing this up.

The updated paper (http://arxiv.org/abs/2403.11107) now reflects the fact that we actually multiply the predicted mask M_n from stage 1 with the ViT patch embeddings (after resizing the M_n to the patch embedding tensor's height and width). We do not mask the image directly with the predicted cross-attention map to obtain the foreground and the background average embeddings (as followed in existing works).

Also, we added the supplementary material at the end of the main paper - please find in the updated paper link above. Thanks!