mosdef-hub / foyer

A package for atom-typing as well as applying and disseminating forcefields
https://foyer.mosdef.org
MIT License
117 stars 76 forks source link

Incorrect handling of partially typed systems in general_forcefield.py #491

Closed chrisiacovella closed 2 years ago

chrisiacovella commented 2 years ago

Bug summary

It is not uncommon for a force field to only parameterize a subset of the possible topological constraints. For example, a force field may only have a subset of the possible quartets of atoms described by a dihedral interaction, or in some cases, may not have any dihedral terms as is often common when modeling a solid structure or in many CG force fields.

To handle this, in forcefield.py, we added in the ability to suppress errors associated with the detection of untyped topological constraints (ie.., bonds/angles/dihedrals/impropers), by allowing a user to set assert_bond_params/assert_angle_params/etc. to False as required.

For example, in the "old" foyer (i.e., forcefield.py), if assert_dihedral_params is set to False: (1) Foyer will not exit with an error when it encounters an untyped quartet of atoms, but instead show a warning (2) Foyer will return a topology that contains only the typed dihedrals (or be completely empty if none exist)

Currently, general_forcefield.py does 1 correctly, but not 2; it will still include untyped connections in the respective lists, which will cause the writers to fail when they attempt to loop over them. In forcefield.py, checking for untyped connections involved comparing the typed topology to the list of all possible connections; in general_forcefield.py, only a single topology object exists, and checking involves searching this topology for untyped. Fixing this should require a relatively simple extension whereby a topology of only the typed parameters is constructed (providing users the ability to toggle this on/off, as a partially typed topology may still be useful, especially during development of a new force field file). I'm working on a some code now.

Code to reproduce the behavior

To demonstrate this I put together a very minimal example where I basically remove angles from my force field to demonstrate the issue. For this molecule (SiO4) there are 6 possible triplets (i.e., angles), but since the force field does not contain any entries for angles, we do not parameterize any of them. While setting assert_angle_params to False will prevent the atomtyping from exiting with an error, if we query the the number of angles in the topology's list of angles, we see it thinks there are 6. When calling the lammps writer, it will fail when it tries to parse the angle topology because it still thinks there are 6, but none of them have been parameterized with a type (because they do not exist as part of the force field (see: https://github.com/mosdef-hub/gmso/blob/9d324f0988bce893655bd51e4dbeca27cd2fd9d6/gmso/formats/lammpsdata.py#L295).

The forcefield file is included in the zip attachment.

import mbuild as mb
import gmso
import foyer
from foyer import general_forcefield

test_mol = mb.load("[O-][Si]([O-])([O-])[O-]", smiles=True)

ff_test = general_forcefield.Forcefield(forcefield_files='alpha_cristobalite6.xml', strict=False)

param_struct = ff_test.apply(test_mol, assert_angle_params = False, assert_improper_params = False, debug =True)
print(len(param_struct.angles))
print(param_struct.angles)

param_struct.save("test.lammps", overwrite=True)

Software versions

daico007 commented 2 years ago

For the second part, you can look into https://github.com/mosdef-hub/gmso/pull/588. Basically, Umesh has implemented an option for the identify_connections to return index only, which allow us to create a connection object (so dihedral/improper) only if the type is found. This is in my to-do list but I haven't got to it yet.