openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
311 stars 90 forks source link

Canonically order molecule before conformer generation #926

Open SimonBoothroyd opened 3 years ago

SimonBoothroyd commented 3 years ago

Is your feature request related to a problem? Please describe.

The ordering of a molecule can affect which conformers are generated for it using OE (and probably also RDKit). This can then lead to different charges and WBOs being produced by the same TK for the same molecule.

The below example shows that the number of conformers generated for different molecules ordering can change significantly:

from openeye import oechem, oeomega

oe_molecule = oechem.OEMol()
oechem.OESmilesToMol(
    oe_molecule, "CC(C)(C)c1sc(c2ccnc(N)n2)c(n1)c3cccc(N[S](=O)(=O)c4c(F)cccc4F)c3F"
)
omega = oeomega.OEOmega()
omega.SetMaxConfs(800)
omega.SetEnergyWindow(15.0)
omega.SetRMSThreshold(1.0)
omega.SetCanonOrder(False)
omega.SetSampleHydrogens(True)
omega(oe_molecule)
print(oe_molecule.NumConfs())
print(oechem.OEMolToSmiles(oe_molecule))

>> 156
>> CC(C)(C)c1nc(c(s1)c2ccnc(n2)N)c3cccc(c3F)NS(=O)(=O)c4c(cccc4F)F

oe_molecule = oechem.OEMol()
oechem.OESmilesToMol(
    oe_molecule, "CC(C)(C)c1sc(c2ccnc(N)n2)c(n1)c3cccc(N[S](=O)(=O)c4c(F)cccc4F)c3F"
)
omega = oeomega.OEOmega()
omega.SetMaxConfs(800)
omega.SetEnergyWindow(15.0)
omega.SetRMSThreshold(1.0)
omega.SetCanonOrder(True)
omega.SetSampleHydrogens(True)
omega(oe_molecule)
print(oe_molecule.NumConfs())
print(oechem.OEMolToSmiles(oe_molecule))

>> 255
>> CC(C)(C)c1nc(c(s1)c2ccnc(n2)N)c3cccc(c3F)NS(=O)(=O)c4c(cccc4F)F

Describe the solution you'd like

To increase consistency it would be good to canonically order the molecule prior to conformer generation, or in the case of OE, set omega.SetCanonOrder(True)

Describe alternatives you've considered

Canonically order the molecule manually, but this isn't ideal in a lot of cases.

Additional context Add any other context or screenshots about the feature request here.

ijpulidos commented 3 years ago

Just made a script to show the problem also happens with rdkit, fwiw. Inspired in @SimonBoothroyd previous code.

https://gist.github.com/ijpulidos/7b0b9ac7d3e4a1692a1dee2825da3b98