new OPLSSAA forcefield from late-2023 paper

Hello everyone, and thank you @jewettaij for the creation of moltemplate, that is an amazing tool.

Recently, I was trying to use the provided OPLSAA .lt file to run some simulations, but I needed a more recent version of the FF; after reading issue #93, I decided to create a new conversion script myself in Python.

As you anticipated in an old post (https://matsci.org/t/system-charge-neutrality-using-opls-aa/32753/3), finding some updated reference FF files wasn't easy. I tried looking at the site you mentioned in issue #93 (https://traken.chem.yale.edu/) without luck.

I ended up using the SI files provided in the late 2023 paper from the group of Jorgensen doi.org/10.1021/acs.jpcb.3c06602, also mentioned in issue #93.

As a side note, similar files are found in the new version of BOSS (5.1), which provides many more atom types. I don't know if I have the right to share those parameters, as I think they are distributed under the BOSS license, but if the final script will work with them, a user would only need to fill in the license information and download-and-convert the files himself.

I'm attaching a first alpha version of the script, which produces a "good" .lt file in the sense that moltemplate does not complain about it, and trying running it for a simple molecule produces a "good-to-my-eyes-and-VMD" lammps data file. I know this is just the beginning, and further testing is needed before sharing the script with the moltemplate community, but I wanted firstly to address some issues I encountered and assumptions I made with you.

So here are my questions, in order of confidence I have in the assumptions I made

if I understand correctly, the OPLSAA FF has an integer-type, identifying a specific atom in a specific environment, and a "more general" two-character-atom-type, which is used to identify bonded interactions. If that's correct, I assumed I can use the integer-type as the type in the lt file, while the two-character-atom-type can be used to define the bonded interactions
some two-character-atom-type in OPLSAA use characters in their definitions that are "incompatible" with moltemplate (:, $, #, *), so I just replaced them with other characters not already used in OPLSAA
there ise a "strange" dihedral definition in the OPLSSAA file, HC-CT-CT-C(O), which I didn't know how to interpret, as the atom-type C(O) is not defined anywhere. I simply skipped this dihedral definition for now...
I don't know if I have to consider the note reported in LAMMPS https://docs.lammps.org/dihedral_opls.html "Note that the usual 1/2 factor is not included in the K values." for dihedral K values... I assumed that the K values in the OPLSSAA file are ok for LAMMPS, but that's probably wrong
where I have more doubts: the improper section! 5a) In the OPLSAA file, the impropers are just defined among the dihedral definitions, with a comment indicating that they are impropers and not proper-dihedrals. So I think they will just use the same equation of the dihedrals. One way would be to treat them as proper-dihedral also in LAMMPS, but I don't think this is really the correct approach... the easiest approach seems to treat them as with the improper_style cvff, as they only have non-0 for K2. This is the approach I followed (doubling the valaue of K2, following the previous assumption that the 1/2 therm was included in the OPLSSAA files), but this will not permit the use of full /kk (GPU) acceleration (I don't need it for now, but it may be important for others). 5b) In the OPLSAA file the wildcards */? are not used, and instead a generic X-Y-Z atom-type is used. I just replaced these with *, indicating any atom type, but I don't know if the use of X-Y-Z indicated that they must be three different types (in such case, I have to change the script to create the relevant atom-types combinations, but that would produce much more improper lines than the original oplsaa.lt file, so I thought this was not necessary, but I don't really know).

NOTE: I know the original oplsaa.lt uses the harmonic style for the impropers, I assume a similar conversion could be made here, but I hadn't thought how to do it for now, as I read in some posts that in the creation of the original oplsaa.lt file some correction was made by hand on the impropers... hopefully someone will have something to say here...

I'm also tagging @hothello, as he seems to be working a lot with OPLSAA, I hope I'm not bothering you too much.

OPLSAAfromSI_a01.tar.gz

Just a quick update: I noticed that in the SI FF files all water models are assigned to the OW HW types, meaning all water models share the same bonded interactions; that's obviously not true.

By looking at the .prm file from Tinker (the one used to produce the current oplsaa.lt), I found that in that file different water models have different types for O and H, so the conversion script had not this problem. it would be easy to define new atom types for the different water models in the new conversion script, but I would like to address the problems highlighted in the previous message before talking about this (easier) stuff.

Thank you DrDomenicoMarson for taking the extra time to create an oplsaa.lt file for 2023! I will include it in moltemplate soon with appropriate caveats.

Keep in mind that every force field shipped with moltemplate was originally labelled "experimental". Users would try the oplsaa forcefield with the understanding that they use it "at their own risk". But gradually many users ironed out the problems with the old oplsaa.lt file, much the way bugs in open-source projects eventually get fixed. So it's okay if your .lt file has some mistakes. I will reply with more details in the next 48 hours.

I realize creating a complete working oplsaa.lt file was probably more than the minimum needed for your own research. It is very much appreciated.

I see that you put a lot of careful thought into this conversion!

TLDR

We need to fix a problem in the order of the dihedrals and the parameters for the impropers. (See below.) But otherwise, this "oplsaanew.lt" file looks good! (If it's okay with you, I will rename it to "oplsaa2023.lt".)

Here are my answers to your concerns

1. Atom Names

Correct. OPLSAA files use this system. Comment: I really prefer the two-character atom names (eg "CT") in this new version! I'm glad you are using these atom type names. I think it will make it much easier for new users to choose the correct atom types.

2. Forbidden Characters

Great. I'll add a comment to the top of your new oplsaa.lt file which mentions the atom name changes that you made (C:->C°, C$->C^, N$->N^, O$->O^, C#->C|, N*->N|). But I'd like to request a change REQUEST: Please change that last substitution to N*->N&, instead of N*->N|. (Because the | character was previously used as a substitute for #. If possible, I'd prefer to use new character for `. It doesn't have to be&`.)*

3. What is HC-CT-CT-C(O)?

I can't figure out what this refers to either. Go ahead and ignore it.

DETAILS: Feel free to skip...

I found the string "C(O)" in three places in the atom-definition section of the Jorgensen_et_al-2024-The_Journal_of_Physical_Chemistry_B.sub-2.txt file:

TYP AN AT   CHARGE     SIGMA    EPSILON
453 06 CT    0.28     3.50      0.066      C(O)    MePO3Me- methylphosphonate
458 06 CT    0.32     3.50      0.066      C(O)    benzyl methylphosphonate
 :

Type   V1         V2        V3        V4               description  
115   0.000     0.000    -0.076     0.0        HC-CT-CT-C(O)   aldehyde & ketone & acyl halide

This is confusing. Methylphosphonate is not an aldehyde, ketone, acyl halide.

So I agree with you. I think we can ignore this dihedral interaction because it is definitely a recent modification to the force field. (There are no dihedral interactions in the original "oplsaa.lt" file from 2008 containing "-0.076".)

4. Dihedrals

I think your dihedrals are correct. The magnitude of the dihedral interactions in your new .lt file (eg "0 0 0.3 0" for "@dihedral:HC_CT_CT_HC") matches the magnitude in the current (old) oplsaa.lt file (for "@dihedral:046_013_013_046"). The old oplsaa.lt file has been used for a decade and scrutinized by many users (including the dihedrals). It also matches the magnitude of the numbers in the LAMMPS documentation: https://docs.lammps.org/dihedral_opls.html. But we probably need to change the order of the dihedrals in the file. Moltemplate creates new dihedrals according to the rules in the "Data Dihedrals By Type" section of the .lt file, in the order they appear. Here's an excerpt from your file:

@dihedral:CT_CT_CT_OS @atom:*_b*_a*_dCT_i* @atom:*_b*_a*_dCT_i* @atom:*_b*_a*_dCT_i* @atom:*_b*_a*_dOS_i*
@dihedral:CT_CT_CT_O€ @atom:*_b*_a*_dCT_i* @atom:*_b*_a*_dCT_i* @atom:*_b*_a*_dCT_i* @atom:*_b*_a*_dO?_i*

Here is the problem: Suppose you have a molecule containing 4 consecutively atoms of (bonded) type CT_CT_CT_OS. In this case moltemplate will first create a dihedral of type "@dihedral:CT_CT_CT_OS". Then it will create another dihedral of type "@dihedral:CT_CT_CT_O€" for those same 4 atoms (overwriting the first dihedral). That's because the "O€" ("dO?") wildcard overrides the "OS" ("dOS"). So we should swap the order of these two lines. Unfortunately there are hundreds of dihedrals containing wildcards ("€"). If I don't hear back from you, I'll take a look at changing your code to fix this problem.

5. Impropers

Impropers are always a headache. We've never been completely sure if the improper interactions in "oplsaa.lt" were correct. As for your question, yes, you can use improper_style cvff here.

Improper coefficients:

Correction1: I think you should divide the coefficients by 2, not multiply them by 2. (In other words, "30.0" -> "15.0", not "60.0".)

Correction2: I think you want to set the "d" parameter to -1 not +1. For reference, see: https://docs.lammps.org/improper_cvff.html https://docs.lammps.org/dihedral_opls.html

Improper atom types names:

Don't worry about the X Y Z being different from each other. The "oplsaa.lt" file (and TINKER file) does not require that these atoms are different. ("oplsaa.lt" just uses \ (wildcards) for X Y Z).

Here is a comparison of the "oplsaa.lt" and "oplsaanew.lt" files:

current "oplsaa.lt" file (2008)	new "oplsaanew.lt" file (2023)
@improper:X_X_003_004 10.5 180.0	@improper:OC£_£ 42.0000 1 2
@improper:X_X_024_X 1.0 180.0	@improper:£N£_£ 10.0000 1 2
@improper:X_X_047_X 15.0 180.0	@improper:£CM£_£ 60.0000 1 2
@improper:X_X_048_X 1.1 180.0	@improper:£CA£_£ 10.0000 1 2

It looks like "oplsaa.lt" is wrong! The cofficients in my "oplsaa.lt" file for X_X_024_X and X_X_048_X are too weak. I obtained those numbers from the force-field files shipped with Gromacs (so those Gromacs files are also probably incorrect).

Fortunately, improper interactions are only used to keep the 4 atoms in the same plane. If they are too weak or too strong, it does not seem to affect the shapes of the resulting molecules significantly. (The benzene rings all look flat to me when I simulated them using the weaker parameters from "oplsaa.lt".) Still, this is embarassing. I trust your impropers more than mine. (I will update "oplsaa.lt" soon.)

There are some other differences as well, but I'm getting too tired to keep typing tonight. I'll followup with another post tomorrow.

Thank you for you careful comments and kind words! I'll gladly address all the issues raised, but unfortunately I think I found a deeper issue with the original FF files (see last point).

0. Sure, the oplsaa2023.lt it's much better!

1. I also like the 2 letter names, I agree they are much more "readable" by the user!

2. I'll add the comment on the created oplsaa.lt file explaining the conversions, and I'll change the N| to something else (probably & will be fine, I have to check that molemplate doesn't complain about it).

3. Maybe we could add a comment in the new .lt file dihedral section stating we are missing this "unclear" dihedral

4. Great for the magnitudes, less so for the order of dihedrals :-) I completely missed that, I know moltemplate expects wildcard-dihedrals to appear before any "more specific" dihedrals, and that's so smart I assumed the original FF file used a similar approach without checking my assumption... I'll move every definition of dihedral with a wildcard at the top of the dihedrals definitions, with some logic like "the dihedrals with more */? appear on top of the ones with fewer wildcards"

5. As for improper I spotted a possible error in my implementation as I was missing the (opls_imp.py) in write_once("Data Impropers By Type (opls_imp.py)"). To be fair I still don't completely understand how improper should be converted, I was trying some trial-and-error on a simple benzoic acid molecule to compare the old and the new forcefield, but I spotted the nasty problem I mentioned above:

BIGGER PROBLEM

in the SI-FF, some dihedrals have the same two-letter definition but differ in the parameters (the difference is explained in the comment section). I'll paste here three examples (they don't even appear near each other in the original file)

dihedral_coeff @dihedral:O_C_OS_CT 0.000 5.124 0.00 0.   esters
dihedral_coeff @dihedral:O_C_OS_CT 0.000 5.000 0.00 0.   benzoic esters 

dihedral_coeff @dihedral:O2_P_OS_CT 0.00  0.0   0.00 0.    phosphonates
dihedral_coeff @dihedral:O2_P_OS_CT 0.90 -2.93 2.64 0.   dimethyl phosphate 

dihedral_coeff @dihedral:O_C_OH_HO 0.000 5.500 0.00 0.   carboxylic acid - aliphatic
dihedral_coeff @dihedral:O_C_OH_HO 0.000 5.000 0.00 0.   benzoic acids

I'm unsure how to fix this problem, as there are many such dihedrals, and I spotted even more angles with the same issue. The solution may be to create a "new" two-letter atom-type definition every time this situation occurs (the best thing maybe should be to create three-letter atom-type definitions in such cases, and maybe with clever use of * and ? we could get away with it; I haven't thought about it carefully yet.

The problem is I don't know how to automatically identify "which" atom types have to be "duplicated" as only an LLM may be able to understand the comments and the description of atom-types to do such thing, but I don't really have the expertise to do something like that (assuming this is the best way).

Otherwise, we could create "by hand" some kind of mapping in a separate file. The first time the script is run it could ask the user, whenever it encounters a duplicated dihedral, which atom types have to use the first and the second definition, write it in a mapping file, and on subsequent runs the file is read and the user is not bothered... that should be particularly tedious, but doable (maybe a little bit over what I was expecting when I started this "project" :-) )...

Regarding the bigger problem

Interesting! The problem is in tinker and "oplsaa.lt" too. (See below.)

TLDR

As you probably know, the two files from the supplementals section of the paper (sup-2.txt and sup-3.txt) are BOSS files. (They are normally named "oplsaa.par" and "oplsaa.sb", respectively. The most recent versions of these files are included with BOSS when you download it.)
You should not do anything by hand.
No LLMs :)
Your converter should not have to examine the comments inside the BOSS files.
For now, I think we should ignore the duplicate dihedral entries. (See below for why.)

Other issues you brought up:

Great.
Great.
Great.
If you get stuck with this, I can help (once we resolve the bigger problem).
Good catch! Yes, you need the "(opls_imp.py)". (I forgot about this! I confess it's been a while since I worked on moltemplate.)

As you mentioned, the BOSS licence forbids us from converting and distributing the most recent BOSS files (included when you download BOSS). So it's great that you found those supplemental files included with that paper.

As you mentioned, the BOSS licence forbids us from converting and distributing the most recent BOSS files (included when you download BOSS). So it's great that you found those supplemental files included with that paper.

If the final script is done right, a user who downloads BOSS should be able to convert the oplsaa FF files from BOSS and use it locally, hopefully!

Regarding the "bigger problem

If one of the possible solutions would be to just ignore the "multiple" dihedrals/angles, in the next few days I'll implement the changes we discussed previously, so we'll have a "beta" version of the converter.

Should I upload the updated version here or should I directly create a PR at this stage?

In the meantime, I could write to Professor Jorgensen, and the author of TINKER, to have some insight on how to approach the problem.

Another possible small problem Having a quick look at the water model parameters, it seems the BOSS/paper FF files treat all water atoms as two-letter type OW HW, and so apply the same bond/angle parameters for TIP3/4/5P and SPC waters... as far as I understand this is not correct, and if I remember correctly in the old oplsaa.lt there are different "two-letter" atom types for SPC and TIP3/4/5P, so the problem is not present there. I think I could just hard-code a change in the converter, creating different atom types for the different water models (and while I'm at it I could create an atom type for SPC/E, in which I have only to change the partial charges).

Small request: Go ahead and include all the dihedral interactions. But comment out the duplicates dihedrals (Put a # character at the beginning of those lines. Moltemplate will ignore those lines.) This is what TINKER does.
Yes, please create a PR. That would be great. (That makes it easier to track contributors!) But for now, just include the converter program. (I will rename it and add one or two small changes.) You can add another PR with the final "oplsaa2023.lt" file later when it is ready.
If you correspond with anyone about the bigger problem, please CC my email (jewett.aij-at-gmail.com)

I am sorry I haven't been very helpful in debugging the conversion process.

Just to let you know, I have added a OPLS paragraph to the LAMMPS How To documentation, on top of the Moltemplate How To, which is based on the current (outdated) version of oplsaa.lt. See https://github.com/lammps/lammps/pull/4357 I am looking forward to advertising the latest OPLS force field version in Moltemplate. It is very useful, especially for beginners to get started with LAMMPS.

No worries Otello!

Thanks for the PR, Domenico. It is now included with moltemplate! (Although nobody knows it's there yet.) I made some minor changes to the "oplsaa2lt.py" file. If you are making more changes, please base then on the updated version of this file.

jewettaij / moltemplate