Closed AndDoIt closed 3 years ago
Yes, there is overlap between valid.txt and train.txt triples. test.txt should not have leakage AFAIK
This is because our aim is to train embeddings for QA, not for KG completion. Ideally, there should be no valid/test.txt since we should train on all the triples. The purpose of valid/test.txt is to just keep the format that is used by most KGE methods to create embeddings (eg libKGE, pykeen).
Thanks for your reply, I got it. Since I randomly copy any triplet from test.txt in the KG dataset of MetaQA, it co-occurs in the train.txt, so could you please check the corresponding dataset again?
I mixed up valid and test.txt in my last message - test.txt has triples from train.txt while valid.txt shouldn't have overlap.
Thanks for your excellent work for multi-hop KGQA! I find the KG dataset of MetaQA that you provided has serious data leakage among train.txt, valid.txt and test.txt, so I want to make sure whether your pre-trained embeddings are based on this dataset.