salesforce / PCL

PyTorch code for "Prototypical Contrastive Learning of Unsupervised Representations"
MIT License
570 stars 83 forks source link

question about concentration around a prototype #16

Open vmmm123 opened 2 years ago

vmmm123 commented 2 years ago

In the paper, you have mentioned "With the proposed φ, the similarity in a loose cluster (larger φ) are down-scaled, pulling embeddings closer to the prototype", but i am wondering why the down-scaled similarity can force them get closer? Could you please explain it more detailedly? Thanks!

LiJunnan1992 commented 2 years ago

Hi, thanks for your question!

The loss function will try to increase the similarity between an embedding v and its positive prototype c: v \dot c / phi. When phi is larger, v \dot c also needs to be larger in order to increase the similarity. Therefore, the embedding becomes closer to the prototype.

vmmm123 commented 2 years ago

ok, it is a direct thought. I try to understand it from the angle of gradient and i am afraid that the larger gradient may force the model more focus on the tight cluster when / phi is smaller.