The MetaQA dataset - Githubissues

malllabiisc / EmbedKGQA

ACL 2020: Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings

Apache License 2.0

415 stars 95 forks source link

The MetaQA dataset #125

Open LinxiCai opened 1 year ago

LinxiCai commented 1 year ago

when I read the train.txt, valid.txt and test.txt in data/MetaQA, I found the triples in test.txt are included in train.txt, could you explain why should this happen?

apoorvumang commented 1 year ago

Hi LinxiCai, thanks for your interest.

We studied MetaQA for the QA task, not KG completion task. We want to pretrain on the whole KG (or 50% KG depending on setting) and then finetune for QA. test.txt and valid.txt triples exist just for compatibility with KGE implementations, which require separate validation and test triples. So we simply copied triples from train.txt to test.txt to maintain compatibility.

LinxiCai commented 1 year ago

OK,thanks a lot !! I understand. By the way, I had another question, when I train embedding for metaQA triples, if I don't use dropout or batch_normalization, will there be overfitting? or can you share your training arguments when you get your MetaQA embedding?