snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.93k stars 397 forks source link

About noise labels #235

Closed dongZheX closed 3 years ago

dongZheX commented 3 years ago

Hello, thanks for the code and datasets.

Could you tell me whether there possibly are some noise labels in the graph property prediction dataset such as ogbg-molhiv and ogbg-molpcba?

Thanks. ^.^

weihua916 commented 3 years ago

All the labels are considered gold-standard, but potentially have inherent noise due to measurement error.

dongZheX commented 3 years ago

All the labels are considered gold-standard, but potentially have inherent noise due to measurement error.

Thanks. I evaluate my model in hiv dataset, and I set a fixed seed. But the results of 10-times experiment (same hyperparameters) have large variance.So, I doubt that there are some noisy lable, maybe. I try to use some noisy detection methods to solve this, but it doesn't work. Maybe my model or code has some problem.

Besides this, the result(avg value is about 80) in test dataset is higher than that in val dataset(avg value is about 79), I think it is abnormal. But I don't know why.

weihua916 commented 3 years ago

I see. One critical component might be the scaffold split we used. The distributions of train/val/test are all different --- we are testing the out-of-distribution generalization of your model.