snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.89k stars 397 forks source link

ogbn-arxiv text features don't match #478

Open devinbost opened 2 months ago

devinbost commented 2 months ago

I raised a concern in the DGL Github that there's a mismatch between the ogbn-arxiv text features and their graph representation. Wanting to surface it here as well. https://github.com/dmlc/dgl/issues/7270

devinbost commented 2 months ago

The DGL graph node count actually matches the number of nodes referenced here: image (https://ogb.stanford.edu/docs/nodeprop/)

So, it seems that the text abstracts in https://snap.stanford.edu/ogb/data/misc/ogbn_arxiv/titleabs.tsv.gz (179,719 total) contain additional nodes that were not included in the other graph?

How do I map between the two?

VeritasYin commented 2 months ago

Check Issue #222 might be helpful.