princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.33k stars 505 forks source link

Question about infonce loss #212

Closed ksblk2116 closed 1 year ago

ksblk2116 commented 1 year ago

In your code, you gather all the embeddings from gpus in DDP. image But the loss function you comput is not divided by the nums of GPU. I think this post explain why need to divide. So,whether the loss is divided by nums of GPU or not doesn't matter? Hope your explaination.Or, I miss something.

gaotianyu1350 commented 1 year ago

Hi,

We do not need to explicit divide the gradient by the number of GPU here. The loss is divided by the batch size in the end, which takes the number of GPUs into account.