vturrisi / solo-learn

solo-learn: a library of self-supervised methods for visual representation learning powered by Pytorch Lightning
MIT License
1.41k stars 185 forks source link

Ask about the return loss "simclr loss + classification loss" #174

Closed KimSoybean closed 2 years ago

KimSoybean commented 2 years ago

Hi, I see the code in https://github.com/vturrisi/solo-learn/blob/8ab3c7ef7fdca8e0471c64700de3d9df60d7e47c/solo/methods/simclr.py#L188

I am confused that your code optimize the "unsupervision loss + supervised loss"? Or I misunderstand it?

ankitpatnala commented 2 years ago

Yes, even i got confused before. Yes, both loss are calculated simulatenously but the author uses detach https://github.com/vturrisi/solo-learn/blob/8ab3c7ef7fdca8e0471c64700de3d9df60d7e47c/solo/methods/base.py#L416

Detach function doesn't allow the class_loss to optimize (manipulate) the weights of encoder network.

KimSoybean commented 2 years ago

Yes, even i got confused before. Yes, both loss are calculated simulatenously but the author uses detach

https://github.com/vturrisi/solo-learn/blob/8ab3c7ef7fdca8e0471c64700de3d9df60d7e47c/solo/methods/base.py#L416

Detach function doesn't allow the class_loss to optimize (manipulate) the weights of encoder network.

Thanks a lot!

pragyasrivastava0805 commented 2 years ago

How do they deal with the semi-supervised case where the class loss needs to be optimized?

pragyasrivastava0805 commented 2 years ago

@ankitpatnala

vturrisi commented 2 years ago

Hi @pragyasrivastava0805, just override this function https://github.com/vturrisi/solo-learn/blob/f25dc1fccb3c3ed2dbd848138adc2d5594f671a4/solo/methods/base.py#L481-L496 in your method class removing the .detach(). In this case, the method will be optimized with the self-sup loss and the classification loss. If you need some extra variation of this, you will possibly need to override the _base_shared_step method.

DonkeyShot21 commented 2 years ago

Afaik the semi-supervised experiments presented in self-supervised learning papers do not optimize the supervised loss at pre-training time. They just use a fraction of the labels for linear evaluation.