Ligand Graph Featurizer

Description

Tie up loose ends for the graph ligand featurizer and more specifically, the atomic features.

By default, we will use the same ones as the PotentialNet model, https://doi.org/10.1021/acscentsci.8b00507.

Taken from the PotentialNet paper:

"Deep Neural Networks were constructed and trained with PyTorch.(52) Custom Python code was used based on RDKit(53) and OEChem(54) with frequent use of NumPy(55) and SciPy.(56) Networks were trained on chemical element, formal charge, hybridization, aromaticity, and the total numbers of bonds, hydrogens (total and implicit), and radical electrons. "

Todos

[x] Check the atomic features in PotentialNet
[x] Implement the same in kinoml
[ ] Unit test
[ ] Running on experiments-binding-affinity

Questions

How to one-hot encode based on a list of string? BaseOneHotEncodingFeaturizer.one_hot_encode([atom.GetHybridization().real], rdkit.Chem.rdchem.HybridizationType.names)
Is it the best way to go to one-hot encode most atomic properties? Maybe just use this version of kinoml https://github.com/openkinome/kinoml/blob/1cf6f95b7763c335227133d4d569f92e76d337f7/kinoml/features/ligand.py#L214
Should we use all available rdkit atomic properties? In RDKit: atomic properties https://www.rdkit.org/docs/source/rdkit.Chem.rdchem.html#rdkit.Chem.rdchem.Atom

Status

[ ] Ready to go

Notes

For sake of completion, let's look at the features implemented in

deepchem (see code) , and used in MoleculeNet:
- one-hot encoded atomic symbol
- one-hot encoded degree
- one-hot encoded implicit valence
- formal charge
- number of radical electrons
- one-hot encoded hybridization type
- aromaticity
Takayuki Serizawa et al., poster presentation at the RDKit UGM 2019:
- one-hot atom type
- one-hot degree
- one-hot valence
- formal charge
- one-hot hybridization type
- number of racial electrons
- aromaticity
- one-hot encoded number of hydrogen atoms
- partial charge (not rkdit!)

openkinome / kinoml