snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.89k stars 397 forks source link

[Discussion] Raw text data should be made available for WikiKG90Mv2 #305

Closed apoorvumang closed 1 year ago

apoorvumang commented 2 years ago

I had wanted to start a public discussion but my post got deleted ( #304 ) . So I am reposting as a new issue.

Our method KGT5 (https://github.com/apoorvumang/transformer-kgc and soon to be presented in ACL 2022 main conference) was not included in the leaderboard since it violates the 'no raw data' rule of OGB-LSC leaderboard submissions. This is because we used the titles + descriptions in text form rather than the MPNet embeddings provided by the ogb library. We used the exact same data as used by the best performing leaderboard method, nothing 'extra'.

Although I understand the restriction on raw data to prevent 'cheating' - which is also the reasoning given in the discussion here - I believe this vastly restricts the space of models that are allowed to feature on the leaderboard. Our model is just as 'fair' as the ones on the leaderboard (such as TransE-concat), and I feel the dataset authors should reconsider the rules for WikiKG90Mv2 since using text embeddings is a big bottleneck and is not ideal for all scenarios.

This is the author's response over email (@weihua916) when I raised these issues, and asked for test set numbers:

Great points. Unfortunately, it would be hard for us to make any exceptions. Of course, better textual representation could improve performance, but in OGB-LSC we want to focus on graph methods, and we think the improvement of textual methods and graph methods are somewhat complementary.

We cannot provide the test set performance. I'd suggest you simply use the validation performance in your paper.

I don't understand the meaning of 'we think the improvement of textual methods and graph methods are somewhat complementary'. In our report, we show our method works on other datasets as well. Also, TransE-concat uses text, so why is that on the leaderboard?

Although I continue to disagree with the leaderboard policies, I want to emphasize that WikiKG90Mv2 is a great resource for training/testing KG Embeddings on a large scale and one of the largest benchmark KGs available with a well thought out test/validation split. I really appreciate the effort put into making the dataset + updating its evaluation scheme in v2. I would request the authors to keep the textual information available - even if not leaderboard worthy - in case someone wants to use this resource in some other form than link prediction.

I hope the authors and those who have contributed to the leaderboard can comment on this. @weihua916 @hyren

weihua916 commented 1 year ago

Great suggestion. We understand it and we will definitely think about this option in the future. However, we will not do this at least during the NeurIPS competition. Thanks for understanding.