snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.89k stars 397 forks source link

some classes are missing in ogbn-products training and validation splits #299

Closed sisaman closed 2 years ago

sisaman commented 2 years ago

I just noticed that in the ogbn-products dataset, some classes are present in the test split, but are missing in the training and/or validation splits. More specifically, out of 47 total classes (0 to 46) appearing in the test set, five classes (42 to 46) never appear in the training set, and thirteen classes (34 to 46) are not available in the validation set.

Screen Shot 2022-02-13 at 21 11 32

I am wondering if this is intentional?

weihua916 commented 2 years ago

Hi, thanks for finding this out. This is not intentional.

Basically, around 1.6% of test data points are assigned with labels never seen during the training stage; hence, we cannot expect any models to perform well on those data points.