Closed HieuPhan33 closed 1 year ago
Thanks for your comments. Q1: We use such multiplying to merge spatial-global and global information for normalization. Please refer to our previous work, "SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition, ICCV 2021". It could cause training failure in our experiments. Q2: Different from origin slot attention (using softmax aims at object object-centric), we use sigmoid to allow concepts' (or call it slot) attention can overlap same region. Q3: We find normalization and structure of the linear layer do not affect the training accuracy in the previous work mentioned above. However, we did not conduct a comprehensive experiment on it in this study. This may affect the accuracy or the learning of concepts.
Hi @wbw520,
Thanks for your detailed response. They are clear.
I also have Q4 about GRU. Compared to Slot Attention and Scouter Attention, the GRU module has been removed in this work.
Could you explain why, and does it affect the model performance?
GRU will not affect the classification performance. It is designed to refine the shape of each object in slot attention. It may provide a better boundary for the shape of the discovered concept in our task, but the impact should be small. Because we don't force the shape to be clear.
Thanks, cheers.
Hi, thanks for sharing the code.
I see there are some variations of your Slot Attention implementation compared to the original slot attention implementation [NIPS2020]. Could you please help me to clarify?
I would be appreciate if you could clarify my confusions. Best wishes.