szq0214 / Un-Mix

Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning.
149 stars 14 forks source link

About how to achieve L_m Loss on SWAV #7

Closed Simon-Stma closed 2 years ago

Simon-Stma commented 2 years ago

Because SWAV is not the same as MOCO, I sincerely want to know how to design the L_M Loss with SWAV

szq0214 commented 2 years ago

Hi @Simon-Stma

Thanks for your interest in our work. It's easy to generalize UnMix to clustering-based methods like SwAV. You can simply consider replacing the soft distance of positive and negative pairs in contrastive loss with soft cluster assignment (i.e., soft fitness between features z and a code q) in SwAV. Specifically, in SwAV, there are similarly two image features $\mathbf z_t, \ \mathbf zs$ from two different augmentations of the same image, the loss function of SwAV is: $L{ori}(\mathbf{z}_t, \mathbf{z}_s)=\ell(\mathbf{z}_t, \mathbf{q}_s)+\ell(\mathbf{z}_s, \mathbf{q}_t)$

where $\ell(\mathbf{z}_t, \mathbf{q}_s)=-\sum_k \mathbf{q}_s^{(k)} \log \mathbf{p}_t^{(k)}$

For UnMix on SwAV, we can also replace one of image feature with normal order of mixture and reverse order of mixture, i.e., $\mathbf z_t^m, \mathbf z^{rm}_t$ , then $L_M$ will be: $L_M=\lambda L_M(\mathbf{z}^m_t, \mathbf{z}_s)+(1-\lambda)L_M(\mathbf{z}^{rm}_t, \mathbf{z}_s)=\lambda(\ell(\mathbf{z}^m_t, \mathbf{q}_s)+\ell(\mathbf{z}_s, \mathbf{q}^m_t))+(1-\lambda)(\ell(\mathbf{z}^{rm}_t, \mathbf{q}_s)+\ell(\mathbf{z}_s, \mathbf{q}^{rm}_t))$

The final objective is: $L{all}=L{ori}(\mathbf{z}_t, \mathbf{z}_s)+\lambda L_M(\mathbf{z}^m_t, \mathbf{z}_s)+(1-\lambda)L_M(\mathbf{z}^{rm}_t, \mathbf{z}_s)$

Simon-Stma commented 2 years ago

Thank you very much for your reply in your busy schedule! I understand what you mean。

szq0214 commented 2 years ago

Hi @Simon-Stma We have released the code of Un-Mix + SwAV on CIFAR and ImageNet datasets: https://github.com/szq0214/Un-Mix/tree/master/UnMix_SwAV, you can have a look in this repo for the implementation.