Question about Slot Attention Implementation

wbw520 / BotCL

Learning Bottleneck Concepts in Image Classification (CVPR 2023)

33 stars 6 forks source link

Question about Slot Attention Implementation #5

Closed HieuPhan33 closed 1 year ago

HieuPhan33 commented 1 year ago

Hi, thanks for sharing the code.

I see there are some variations of your Slot Attention implementation compared to the original slot attention implementation [NIPS2020]. Could you please help me to clarify?

On Line 45 in slots.py, you normalize in the spatial dimension with dots.sum(2), then scale with the global sum dots.sum(2).sum(1). Could you please explain what is the motivation behind multiplying with the global sum? Would it cause gradient exploding when multiplying with a big number?
On the same line, you use sigmoid instead of softmax. What is the motivation?
Unlike Slot Attention, the NormLayer is removed, and q_linear is a stack of three linear layers instead of only one nn.Linear. How does it affect the performance?

I would be appreciate if you could clarify my confusions. Best wishes.

wbw520 commented 1 year ago

Thanks for your comments. Q1: We use such multiplying to merge spatial-global and global information for normalization. Please refer to our previous work, "SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition, ICCV 2021". It could cause training failure in our experiments. Q2: Different from origin slot attention (using softmax aims at object object-centric), we use sigmoid to allow concepts' (or call it slot) attention can overlap same region. Q3: We find normalization and structure of the linear layer do not affect the training accuracy in the previous work mentioned above. However, we did not conduct a comprehensive experiment on it in this study. This may affect the accuracy or the learning of concepts.

HieuPhan33 commented 1 year ago

Hi @wbw520,

Thanks for your detailed response. They are clear.

I also have Q4 about GRU. Compared to Slot Attention and Scouter Attention, the GRU module has been removed in this work.

Could you explain why, and does it affect the model performance?

wbw520 commented 1 year ago

GRU will not affect the classification performance. It is designed to refine the shape of each object in slot attention. It may provide a better boundary for the shape of the discovered concept in our task, but the impact should be small. Because we don't force the shape to be clear.

HieuPhan33 commented 1 year ago

Thanks, cheers.