snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.89k stars 397 forks source link

[Question] Why are not all MoleculeNet datasets implemented? #400

Closed j-adamczyk closed 1 year ago

j-adamczyk commented 1 year ago

A few large datasets from MoleculeNet, concerning quantum chemistry, are not implemented in OGB, and also PDBbind dataset. What is the reason for this? I understand that QM datasets typically use specific features, but e.g. this paper use regular features (like Morgan / ECFP / circular fingerprints) and get good results. I think that they could be added to OGB with regular set of features. They can also use scaffold split, like in paper I linked before, similar to other MoleculeNet dataset.

weihua916 commented 1 year ago

We could certainly include it, but now we have a better one called PCQM4Mv2.

weihua916 commented 1 year ago

In my opinion, QM9 is not so realistic and somewhat solved (SoTA MAE already below the chemical accuracy). May be a bit outdated to include by now.