snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.89k stars 397 forks source link

OGB features: use safe_index for chirality #399

Closed milesial closed 1 year ago

milesial commented 1 year ago

Featurizer was failing with chirality CHI_TRIGONALBIPYRAMIDAL

milesial commented 1 year ago

@weihua916

weihua916 commented 1 year ago

Hi! Did you see any specific molecule that fails here? I believe the possible number of chirality is limited and won't be increased in the future.

milesial commented 1 year ago

Hi, it fails for some molecules in the pcqm4mv2 SDF file you guys provide.

The number of chiralities is limited, but rdkit provides more chiralities than your featurizer, so the ones that are not in your featurizer should be mapped to 'other' (last index) IMO, in order to not crash and to not have to modify the featurizer number of features.

rdkit possible values: https://www.rdkit.org/docs/cppapi/classRDKit_1_1Atom.html#a8a82ae947ebbbc48f5ec5128f5c3e724

weihua916 commented 1 year ago

LGTM, thanks!

milesial commented 1 year ago

Thanks! Could you also create a new pypi release for this please?

milesial commented 1 year ago

Hi @weihua916, gentle ping on the above request for a version bump ^