Wrong atom types while generating gromacs topology

aniketsh commented 1 year ago

Wrong atomtypes in gromacs topology file

system.top output

[ defaults ] ; nbfunc comb-rule gen-pairs fudgeLJ fudgeQQ 1 2 yes 0.500000 0.833333

[ atomtypes ] ;type, bondingtype, atomic_number, mass, charge, ptype, sigma, epsilon MOL0_0 6 12.01078 0.0000000000000000 A 0.3480646886945065 0.3635030558377792 MOL0_1 6 12.01078 0.0000000000000000 A 0.3480646886945065 0.3635030558377792 MOL0_2 6 12.01078 0.0000000000000000 A 0.3480646886945065 0.3635030558377792 MOL0_3 6 12.01078 0.0000000000000000 A 0.3480646886945065 0.3635030558377792 MOL0_4 6 12.01078 0.0000000000000000 A 0.3480646886945065 0.3635030558377792 MOL0_5 6 12.01078 0.0000000000000000 A 0.3480646886945065 0.3635030558377792 MOL0_6 1 1.007947 0.0000000000000000 A 0.2572581535063279 0.06531785996356952 MOL0_7 1 1.007947 0.0000000000000000 A 0.2572581535063279 0.06531785996356952 MOL0_8 1 1.007947 0.0000000000000000 A 0.2572581535063279 0.06531785996356952 MOL0_9 1 1.007947 0.0000000000000000 A 0.2572581535063279 0.06531785996356952 MOL0_10 1 1.007947 0.0000000000000000 A 0.2572581535063279 0.06531785996356952 MOL0_11 1 1.007947 0.0000000000000000 A 0.2572581535063279 0.06531785996356952

code: from openff.toolkit import ForceField, Molecule, Topology from openff.interchange import Interchange

forcefield = ForceField("openff-2.1.0.offxml") # similar output with 2.0 as well

molecule: Molecule = Molecule.from_file('../mol.sdf') topology: Topology = molecule.to_topology()

interchange = Interchange.from_smirnoff(force_field=forcefield,topology=topology)

interchange.to_top("system.top") interchange.to_gro("system.gro") interchange.to_pdb("system.pdb")

Input SD file

OpenBabel07172321033D

12 12 0 0 0 0 0 0 0 0999 V2000 1.3827 -0.2218 0.0056 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5063 -1.3070 -0.0081 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.8714 -1.0906 -0.0146 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.3732 0.2109 -0.0045 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.4968 1.2960 0.0106 C 0 0 0 0 0 0 0 0 0 0 0 0 0.8810 1.0795 0.0141 C 0 0 0 0 0 0 0 0 0 0 0 0 2.4562 -0.3904 0.0096 H 0 0 0 0 0 0 0 0 0 0 0 0 0.8972 -2.3209 -0.0141 H 0 0 0 0 0 0 0 0 0 0 0 0 -1.5542 -1.9359 -0.0271 H 0 0 0 0 0 0 0 0 0 0 0 0 -2.4466 0.3794 -0.0083 H 0 0 0 0 0 0 0 0 0 0 0 0 -0.8877 2.3100 0.0191 H 0 0 0 0 0 0 0 0 0 0 0 0 1.5638 1.9249 0.0237 H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 1 6 2 0 0 0 0 1 7 1 0 0 0 0 2 3 2 0 0 0 0 2 8 1 0 0 0 0 3 4 1 0 0 0 0 3 9 1 0 0 0 0 4 5 2 0 0 0 0 4 10 1 0 0 0 0

mattwthompson commented 1 year ago

Hi @aniketsh, I'm not quite sure what the issue is. SMIRNOFF does not employ atom types, so the GROMACS file is populated with identifiers that are generated on-the-fly, which may look funny but function fine as far as I've been able to discern.

I also can't seem to load that SDF file - could you double-check if you've copied it accurately?

I can slightly modify your script to generate something that I think works fine:


In [1]: from openff.toolkit import ForceField, Molecule, Topology
   ...: from openff.interchange import Interchange
   ...:
   ...: forcefield = ForceField("openff-2.1.0.offxml") # similar output with 2.0 as well
   ...:
   ...: molecule: Molecule = Molecule.from_smiles("CCO")
   ...: molecule.generate_conformers(n_conformers=1)
   ...: topology: Topology = molecule.to_topology()
   ...:
   ...: interchange = Interchange.from_smirnoff(force_field=forcefield,topology=topology)
   ...:
   ...: interchange.to_top("system.top")
   ...: interchange.to_gro("system.gro")
   ...: interchange.to_pdb("system.pdb")

In [2]: import parmed

In [3]: {atom.type for atom in parmed.load_file("system.top")}
Out[3]:
{'MOL0_0',
 'MOL0_1',
 'MOL0_2',
 'MOL0_3',
 'MOL0_4',
 'MOL0_5',
 'MOL0_6',
 'MOL0_7',
 'MOL0_8'}

aniketsh commented 1 year ago

Thank you for getting back, here is my sd file mol.zip

From older version: import openff.toolkit as ot ot.version '0.11.3' [ atoms ] ;num, type, resnum, resname, atomname, cgnr, q, m 1 C1 0 UNK C1 1 -0.13610011 12.01078000 2 C2 0 UNK C2 2 0.12639989 12.01078000 3 O1 0 UNK O1 3 -0.59980011 15.99943000 4 H1 0 UNK H1 4 0.04236689 1.00794700 5 H2 0 UNK H2 5 0.04236689 1.00794700 6 H3 0 UNK H3 6 0.04236689 1.00794700 7 H4 0 UNK H4 7 0.04319989 1.00794700 8 H5 0 UNK H5 8 0.04319989 1.00794700 9 H6 0 UNK H6 9 0.39599989 1.00794700

From newer version and your code its failing:

In [1]: import openff.toolkit as ot In [2]: ot.version Out[2]: '0.14.0'

File "XXXXXX/test_openff/re.py", line 1, in from openff.toolkit import ForceField, Molecule, Topology ImportError: cannot import name 'ForceField' from partially initialized module 'openff.toolkit' (most likely due to a circular import) (XXXXXXXXXXXXX/openff2/lib/python3.11/site-packages/openff/toolkit/init.py)

j-wags commented 1 year ago

Hi @aniketsh, that import error is new to me. Could you share how you installed the OpenFF toolkit?

aniketsh commented 1 year ago

conda create -n openff2 -c conda-forge openff-toolkit conda activate openff2 conda install -c conda-forge openff-forcefields

j-wags commented 1 year ago

Hm, I'm unable to reproduce the problem using that conda env. It could be something with the conda configuration in your environment. My normal bag of troubleshooting tricks for that would be:

conda deactivate
conda deactivate
conda deactivate
export PYTHONPATH=
source /XXXX/miniconda/etc/profile.d/conda.sh
conda create -n openff2 -c conda-forge openff-toolkit
conda activate openff2

Could you try that and let me know if the problem persists?

aniketsh commented 1 year ago

@j-wags This solution worked, thanks. I now understand that the atomtype representation is the newer version has changed, however I am concerned about having long ligand names and then having to deal with longer atomtypes identifier derived from that ligand name. Is there any reason to change the atomtype identifier from previous versions?

mattwthompson commented 1 year ago

The previous solution was could not scale to well to running simulations of complex systems containing potentially many SMIRNOFF parameters, like systems containing large peptides and/or ligands, in GROMACS. The representation of atom types in SMIRNOFF and GROMACS are fundamentally incompatible - SMIRNOFF has none, GROMACS has them deeply entrenched. Though they mean nothing to SMIRNOFF force fields, they need to exist in GROMACS files, so they are generated on-the-fly just before write time. There is no guarantee that an auto-generated C1 of one molecule encodes the same physics as the C1 of another molecule, and in fact it's unlikely that they by chance do. There must be a way to avoid collisions between molecules, and while it might be possible to use shorter names, storage space is cheap and GROMACS has no problem parsing these names. Previous implementations did not consider any of this and instead wrote out a different atom type for each atom in the system, which is unworkable for large systems.

If this produces issues we will of course reconsider the design, but we're currently unaware of any such behavior. So far, it's worked with a battery of systems (many single-molecule calculations, some large peptides, and some protein-ligand complexes) closely reproducing energies of the OpenMM reference.

It's also worth noting that these identifiers are only ever seen in the GROMACS files. No other representations (in-memory and on-disk) need to use this trick.

openforcefield / openff-toolkit

Wrong atom types while generating gromacs topology #1670

Input SD file