About fair filtered evaluation setting

kbs391kbs commented 3 years ago

Hi migalkin, I have read the StarE paper and codes, and the idea is quite interesting. Well, I have a small question about the evaluation setting. The evaluation metric is filtered MRR/Hits on missing subject/object entity. If the dataset contain below true facts: f1: Q515632,P1196,Q3739104 f2: Q515632,P1196,Q3739106 f3: Q515632,P1196,Q3739105,P805,Q369706,P1686,Q3241699

When testing the object in the first fact, Q515632,P1196,?, I know the second fact f2 will be definitely filtered, but not sure if f3 (Q3739105) will be filtered. I notice that SAMPLER_W_QUALIFIERS is True for hyper-relational models, while False for triple-based models. To my understanding, in this way, hyper-relational models won't filter f3 but triple-based models will filter f3, which may be unfair for hyper-relational models. For hyper-relational KGC, should I filter out all true objects (including in hyper-relational facts) for filter evaluation?

Besides, I notice that, removing all qualifiers in statements/test.txt is not accord with triples/test.txt. For example, in wd50k, there are 46159 statements in test.txt, after removing qualifiers and filtering repeated triples, it will reduced to 45896 triples. In comparison, there are 46023 triples in triples/test.txt, and the number of their interstected triples is 45510. I am quite confused about this diffference. I thought after removing and filtering operations on statements will lead to same file in corresponding triple directory, but it seems not. Did I miss something, hope you can help me. Thank you!

migalkin commented 3 years ago

Hi, when sampling with qualifiers, our index of true labels now has qualifiers added, so that the label for f3 will be only Q3739105 indeed. As we are now at the statement level (not triples), we think it's fine although it makes it a bit harder for hyper-relational models to rank a correct entity.

Regarding the difference between triple-only and statements - note that you can't just remove qualifiers and de-duplicate, you also need to delete those triples who have qualifier entities in subject/object positions, take care of leakages and retain a similar answers set, and so forth. Once you remove those, you start to have other triples with disconnected entities, and it turns into an infinite loop of deleting / cleaning. At some point, we had to stop and kept that ~400 triples gap as it's pretty negligible (less than 1%) for the overall training and evaluation.

kbs391kbs commented 3 years ago

thanks!

migalkin / StarE

About fair filtered evaluation setting #7