snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.89k stars 397 forks source link

Clarification on Handling ogbl-wikikg2 as Heterogeneous Dataset #482

Open HumeraSabir7303 opened 1 month ago

HumeraSabir7303 commented 1 month ago

Hi OGB Team,

I am currently working with the ogbl-wikikg2 dataset for a knowledge graph completion task. According to the dataset description on the OGB website, ogbl-wikikg2 is a knowledge graph (KG) that contains triplet edges (head, relation, tail) with 2,500,604 entities and 535 relation types. This suggests that the dataset inherently contains heterogeneous data due to the multiple relation types.

However, I noticed that in the dataset's metadata, is_hetero is set to False. This raises a few questions and potential issues for users who wish to treat this dataset as a heterogeneous graph for KG completion tasks.

Questions: Why is is_hetero set to False for ogbl-wikikg2?

Given the nature of the data, should this be considered a heterogeneous graph dataset?

Impact on Model Implementation:

For tasks such as KG completion, how should users handle relation types if the dataset is treated as homogeneous? Can you provide guidance or best practices for users who want to leverage the heterogeneous nature of the dataset (i.e., using relation types effectively in models)? Evaluation Metric:

The current evaluation setup (e.g., Mean Reciprocal Rank, MRR) seems to align with KG completion tasks. Are there specific reasons for not treating this dataset as heterogeneous? Documentation and Examples:

Could you provide more detailed documentation or examples on how to implement models that can handle the multiple relation types in ogbl-wikikg2? Suggested Improvements:

Provide additional examples or guidelines on handling ogbl-wikikg2 as a heterogeneous graph, specifically for KG completion tasks. Clarify any potential implications for using the dataset in its current form vs. treating it as heterogeneous. I believe addressing these points would help many researchers and practitioners better utilize the ogbl-wikikg2 dataset for their projects.

Thank you for your attention to this matter.