openmm / spice-dataset

A collection of QM data for training potential functions
MIT License
155 stars 9 forks source link

Obtaining atomic formal charges #109

Closed JSLJ23 closed 3 months ago

JSLJ23 commented 3 months ago

Hi SPICE authors,

I was wondering if there was a way to obtain atom specific formal charges for the systems within this dataset. I am specifically interested in filtering the systems within the Amino acid, ligand pairs subset and the granularity of formal charges at the per atom level would help me pick of systems which might have individually charged monomers (i.e. the amino acid is +1 and the ligand could be -1), while the overall pair might be neutral (+1 + -1 = 0).

Thank you.

peastman commented 3 months ago

It provides a SMILES string for each molecule, which includes the formal charges. If you want to extract them programmatically, you can do it with RDKit like this:

mol = Chem.MolFromSmiles(smiles)
charges = [atom.GetFormalCharge() for atom in mol.GetAtoms()]
JSLJ23 commented 3 months ago

Thank you for the speedy reply, this is exactly what I needed!

peastman commented 3 months ago

Actually, it's probably better to use OpenFF Toolkit:

mol = Molecule.from_mapped_smiles(smiles, allow_undefined_stereo=True)
charges = [atom.formal_charge.m for atom in mol.atoms]

The reason is that RDKit doesn't parse the atom indices in the SMILES string, so they may come out in a different order. Using OpenFF Toolkit guarantees the atom order will match the fields in the dataset.

JSLJ23 commented 3 months ago

Got it, thanks!