Did you try without LM pretrained weights?

intfloat / SimKGC

ACL 2022, SimKGC: Simple Contrastive Knowledge Graph Completion with Pre-trained Language Models

188 stars 36 forks source link

Did you try without LM pretrained weights? #1

Closed apoorvumang closed 2 years ago

apoorvumang commented 2 years ago

Very interesting work! I had a similar submission to ACL 2022 (https://github.com/apoorvumang/kgt5) and wanted to ask the following question: Did you try to train SimKGC from scratch ie without the pretrained LM weights? In our KGT5 experiments, we found using pretraining had almost no impact (although we obtain worse results than SimKGC).

My intuition says that if you start training from scratch SimKGC still might work. Have you guys tried it?

Thanks Apoorv

intfloat commented 2 years ago

Hi,

Thanks for your interest in our work.

Now it's kind of standard practice to use pre-trained LMs, so we did not thoroughly compare the results with and without pre-trained weights. My best guess is that training from scratch will produce decent results but are not as good as using pre-trained language models.

I ran one experiment on WN18RR dataset this morning using randomly initialized weights without changing any hyper-parameters. Here are the results:

	MRR	H@1	H@3	H@10
with pre-trained (paper reported)	66.6	58.7	71.7	80.0
w/o pre-trained	56.9	50.7	59.8	68.8

With pre-trained LMs, the model converges to much better results.

For your case, it could be possible that the training dataset is large enough to shadow the benefits of pre-training.

Best, Liang

apoorvumang commented 2 years ago

Hi Liang,

Thanks for such a prompt response! Those are some very cool results on WN18RR, seems like a very good evidence of transfer learning on link prediction task.

I agree, for WikiKG90Mv2 it might be that training dataset is large enough. I will also try to run KGT5 on WN18RR with/without pretraining and see the difference.

However, I have a doubt that the KGT5 training methodology - which is just seq2seq and does not use any negatives - could be suffering from the issues outlined in your paper, and some sort of contrastive training could give benefits. Do you think InfoNCE (or any contrastive loss/training) could be applied to seq2seq models? If so, what would be your recommendation as a starting point?

Thanks Apoorv

intfloat commented 2 years ago

I think the idea of KGT5 is very cool. By formulating KGC as a seq2seq task, it implicitly treats all sequences except the ground-truth as negatives.

On combining contrastive learning with seq2seq models, I can recommend two papers as listed below. But there does not seem to be a widely acknowledged method, and I am not sure how much gains it will bring.

Best, Liang

apoorvumang commented 2 years ago

Thank you so much for the pointers! I will definitely look into those :)

apoorvumang commented 2 years ago

Just an update: I tried WN18RR + KGT5 with pretrained weights and there was significant improvement in MRR: from 0.508 without pretrained weights to 0.532 with them.