intfloat / SimKGC

ACL 2022, SimKGC: Simple Contrastive Knowledge Graph Completion with Pre-trained Language Models
180 stars 36 forks source link

The performance about infoNCEloss? #12

Closed maoulee closed 1 year ago

maoulee commented 1 year ago

Hi,LIiang。 Nice work for pretrained model. Inspired by your work, I used InfoNCELoss for a BERT model that has a traditional negative sampling strategy. But unfortunately, InfoNCEloss achieved a rather poor performance on WN18RR compared to RankLoss. What do you think is the reason for this?

Here is the performance on the WN18RR dataset (Epoch=7, Euclidean distance) Hits left @1: 0.019762845849802372 Hits right @1: 0.016798418972332016 Hits @1: 0.018280632411067192 Hits left @3: 0.042490118577075096 Hits right @3: 0.036561264822134384 Hits @3: 0.039525691699604744 Hits left @10: 0.13537549407114624 Hits right @10: 0.09683794466403162 Hits @10: 0.11610671936758893 Mean rank left: 7952.986166007905137 Mean rank right: 7856.22430830039526 Mean rank: 7902.605237154150196 Mean reciprocal rank left: 0.07150598418681733 Mean reciprocal rank right: 0.0619137549730991 Mean reciprocal rank: 0.06670986957995823

Here is my reproduction of InfoLoss:

pos_distances=pos_distances.reshape(bs,1) neg_distances=neg_distances.reshape(bs,nt) logits=torch.cat([pos_distances,neg_distances],dim=1) logits=logits*self.log_inv_t.exp() labels=torch.zeros([bs]).cuda().long() info_loss_fn=nn.CrossEntropyLoss() info_loss=info_loss_fn(logits,labels)

intfloat commented 1 year ago

This is unexpected, looks like your results are only a little bit better than random guess.

Are you using Euclidean distance? For InfoNCE loss with temperature, you are supposed to do L2 normalization first.

maoulee commented 1 year ago

Thanks for your help, I reworked the code and the effect is close to RankLoss on the validation set (incomplete) as follows. Eval results at 4000 (Epoch1, Cosine similarity, Neg=11) Hits @1: 0.232 Hits @3: 0.3952 Hits @10: 0.6086 Mean rank: 12.692 Mean reciprocal rank: 0.3587

(The effect in the test set is as follows: Epoch1, Cosine similarity, Neg=11) Hits @1: 0.0456 Hits @3: 0.1067 Hits @10: 0.1935 Mean rank: 1308.356 Mean reciprocal rank: 0.0965

This may be caused by the low Epoch number. I remember that you mentioned in your paper that you trained 50, 10, 1 Epochs for each dataset. Can you provide the test set results for WN18RR at Epoch 1? Thanks!