openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
318 stars 92 forks source link

Sanity check mol2 input #512

Open trevorgokey opened 4 years ago

trevorgokey commented 4 years ago

Describe the bug When loading a malformed mol2 file where all bond orders are 1 and partial charges are present, the implied formal charge is different from the partial_charges.sum(), and create_openmm_system will raise an exception because of this.

Essentially, from @jchodera:

This is likely also the origin of the "bug". The sum of the partial charges correctly reports the net charge as +8:

>>> m = Molecule.from_file('T4-protein.mol2', allow_undefined_stereo=True)
>>> m.partial_charges.sum()
Quantity(value=8.000000746455044, unit=elementary charge)

but to avoid issues with limited precision in reporting the true integral charge of the molecule, the .total_charge property sums up formal charges, which will be incorrect because the bond orders are all incorrect, leading to incorrect valences for most atoms:

>>> m.total_charge
-55

This leads to an idea for a simple sanity check: If the sum of the partial charges and the formal charges differ by a significant fraction of an integer, there is something wrong with the valences, and we should throw an exception on molecule validation. Perhaps we could suggest this in a new issue?

To Reproduce

from openforcefield.topology import Molecule, Topology 
from openforcefield.typing.engines.smirnoff import ForceField 
mol_path = 'T4-protein.mol2' 
molecule = Molecule.from_file(mol_path, allow_undefined_stereo=False) 
topology = Topology.from_molecules(molecule) 
forcefield = ForceField('test_forcefields/smirnoff99Frosst.offxml') 
omm_system = forcefield.create_openmm_system(topology, charge_from_molecules=[molecule], allow_nonintegral_charges=False)

Output See #510 for full output of the create_openmm_system call. See the above description of the problem output

Additional context First discussed in #510 Attached is a malformed mol2

trevorgokey commented 4 years ago

It is a good idea for a sanity check here since we have to spend a few minutes in OpenMM before the exception is raised, so it would be good to just bail if we can detect it ourselves. This does prevent doing things like setting the charges to 0 and passing to OpenMM if we wanted to e.g. just calculate bond energy, and didn't want to run a somewhat expensive partial charge calculation.

With that said, I guess it hasn't been an issue with me personally since most of the molecules I've worked with have total_charge == 0 so the partial_charges have always matched when setting all to 0.

In the case of loading a mol2 file or something with partial charges, then this check should definitely be done in validation.

j-wags commented 4 years ago

I agree -- Let's put this with our other special "what in the world is wrong with your mol2 file?" check: https://github.com/openforcefield/openforcefield/blob/master/openforcefield/utils/toolkits.py#L544-L545

mol2! The gift that keeps on giving!