Colapsing representations

joaco18 commented 1 year ago

Hi, I really liked your paper and I was trying to reproduce your results, so I highly appreciate that you shared the implementation. I have a question regarding the general proposed approach: How are you preventing the latent feature representations to collapse to a trivial solution say all zeros. Because you are only minimizing this crossed (between the 2 global pairs) cross entropy. After many training iterations I don't see why wouldn't both teacher and student latent feature vectors collapse to a meaningless but equal for all cases solution. This is the reason why in many other approaches they use negative pairs and contrastive losses. I might be missing sth, I'd love to hear your insights on this. Thank you in advance, Joaquín.

yeerwen commented 1 year ago

Hi Joaquin,

Thank you for your interest in our paper. Actually, dropping the negative sample for non-contrastive learning (self-distillation learning) is a commonly used approach in the SSL paradigm. This scheme was initially introduced by BYOL (Bootstrap Your Own Latent A New Approach to Self-Supervised Learning) and has been adopted by various SSL methods, such as DINO. There are multiple understandings of why this approach is effective. I recommend reading the papers "Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere" and "On the duality between contrastive and non-contrastive self-supervised learning". These papers will offer you a clearer understanding of this approach.

Best, Yiwen

joaco18 commented 1 year ago

Thank you very much!

yeerwen / DeSD

Colapsing representations #4