microsoft / HittER

Hierarchical Transformers for Knowledge Graph Embeddings (EMNLP 2021)
MIT License
78 stars 16 forks source link

Unable to reproduce result #3

Closed vardaan123 closed 2 years ago

vardaan123 commented 2 years ago

Thanks for releasing the code for your work. I am trying to reproduce the numbers for the no context version of the model. Using the config file trmeh-fb15k237-noctx.yaml, I am getting the following metrics

Dev set MR=167.69, MRR= 0.327, Hits1 = 0.2308, Hits10= 0.520

Test set MR=170.79, MRR= 0.369, Hits1 = 0.2749, Hits10= 0.5590

However, in Table 3, the reported numbers are MRR=0.373, Hits@10= 0.561 (Dev set)

Kindly explain how to reproduce the results. Thanks!

sanxing-chen commented 2 years ago

Could you check those filtered_with_test metrics? I noticed that there is a big gap between your dev and test results, which could be for this reason.

vardaan123 commented 2 years ago

Thanks for pointing it out. Using filtered_with_test metrics, I get Dev set MR=156.61, MRR= 0.375, Hits1 = 0.2815, Hits10= 0.5628 which is better than reported.

BTW, do I understand it correctly that filtered_with_test metric considers the test set for filtering in addition to dev and test set? Also, I=is there any equivalent of filtered_with_test metric for the test set evaluation?

sanxing-chen commented 2 years ago

Yes, the reported results are averages.

Yes, evaluation in each set filters out the examples in train/dev and itself by default, so evaluating in the test set is also filtered_with_test just not reported under this name.

You can check eval.filter_splits and its explanation here.

vardaan123 commented 2 years ago

Cool, thanks for the clarification!