Closed sisaman closed 2 years ago
Hi, thanks for finding this out. This is not intentional.
Basically, around 1.6% of test data points are assigned with labels never seen during the training stage; hence, we cannot expect any models to perform well on those data points.
I just noticed that in the ogbn-products dataset, some classes are present in the test split, but are missing in the training and/or validation splits. More specifically, out of 47 total classes (0 to 46) appearing in the test set, five classes (42 to 46) never appear in the training set, and thirteen classes (34 to 46) are not available in the validation set.
I am wondering if this is intentional?