Closed Wuziyi616 closed 3 years ago
I think you are correct that this was a bug, and not an intentional change. Thanks for flagging!
Indeed you're right. Actually I run experiments after fixing it, and the performance difference is very small (<5%). So I think the learned slot initialization distribution is not very important.
Hi. Thank you for opening source this wonderful implementation! I have a small question about a code and think it might be a bug.
In these lines, you define
slot_mu
andslot_log_sigma
usingregister_buffer
. If I understand correctly, tensors created viaregister_buffer
won't be updated during training (see here for reference). I also check my trained checkpoints, these two values are indeed the same throughout the training process.Also, in other slot-attention implementations, they define them as trainable parameters (see PyTorch one and the official one). So I just wonder if this is a bug or intentional behavior?
Update: I didn't observe much performance difference using trainable or fixed mu+sigma. That's very interesting.