snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.89k stars 397 forks source link

Atom Feature for Non-Tetrahedral Stereochemistry #411

Closed shenoynikhil closed 10 months ago

shenoynikhil commented 1 year ago

With this https://github.com/rdkit/rdkit/pull/5084 being introduced, the atom.GetChiralTag() call can return additional chiralities like

CHI_OCTAHEDRAL
CHI_TRIGONALBIPYRAMIDAL
CHI_SQUAREPLANAR
CHI_ALLENE

This gives an error if I use the ogb.atom_to_feature_vector call. I am not sure what other places in the code will be affected by this, but I see an addition to this list being one of the changes.

shenoynikhil commented 1 year ago

I've opened a PR with suggested changes. Feel free to continue the PR.

weihua916 commented 1 year ago

Thanks for the PR. While it is nice to introduce advanced features, I also want to maintain backward compatibility with existing models. I think it makes sense to include these features once they are proven useful for ML tasks. What do you think? Did you find these features actually influence/improve ML model performance?

shenoynikhil commented 1 year ago

I also want to maintain backward compatibility with existing models

I think the thing that would change in terms of backward compatibility is the scope of msc. The idx for 'CHI_UNSPECIFIED', 'CHI_TETRAHEDRAL_CW', 'CHI_TETRAHEDRAL_CCW', 'CHI_OTHER', would be the same as before.

Did you find these features actually influence/improve ML model performance?

I actually faced this issue when I was trying to set features for Molecule3D. Not so sure how much this would affect performance.