Generating redundant impropers when using gaff.lt

jewettaij / moltemplate

A general cross-platform tool for preparing simulations of molecules and complex molecular assemblies

http://www.moltemplate.org

MIT License

251 stars 98 forks source link

Generating redundant impropers when using gaff.lt #32

Closed bhargavchava97 closed 4 years ago

bhargavchava97 commented 4 years ago

Hello there,

When trying to generate data file using gaff.lt, I see a lot of redundant impropers being generated. This seems to be a problem with sorting atoms as I have been able to sort the output and see that the distinct atom sets equal the number of sp2 atoms.

bhargavchava97 commented 4 years ago

gaff_improper.txt This alternative code for gaff_imp.py seems to remove the redundant impropers.

jewettaij commented 4 years ago

When trying to generate data file using gaff.lt, I see a lot of redundant impropers being generated. This seems to be a problem with sorting atoms as I have been able to sort the output and see that the distinct atom sets equal the number of sp2 atoms.

There are two reasons that you might see multiple improper interactions between the same 4 atoms in a data file generated by moltemplate.

1) Some improper interactions have symmetry (eg ammonia):

2) Some angle, dihedral, and improper interactions use wildcards (see explanation below).

If you are using the correct nbody symmetry file (in this case "gaff_imp.py"), both of these problems should go away. Unless you modified "gaff.lt" to change the symmetry file that it uses by default, this should not be an issue.

It's still possible that you might see multiple improper interactions between the same 4 atoms (due to symmetry). But in that case, it is intentional.

(Details: Hopefully, it should only happen when the improper-angle might different if you change the order of the 4 atoms. If the angle is different, then consequently the interaction energy between those 4 atoms will be different also. For example there might be 3 different improper torsion angles in the ammonia example above. In that case, those interactions are not redundant. I don't see how you could give preference to one of them arbitrarily. You must include each of these improper interactions between the same 4 atoms. (More Details: The GAFF force fields ignore the sign of the improper-torsion angles. Otherwise we would have to consider up to 6 different angles for these 4 atoms, half of which have the opposite sign but identical magnitude.))

If you can create a simple example with a relatively small number of atoms where the resulting topology is incorrect, feel free to post it here (along with an explanation regarding what you expected to see) and I will take a look.

Details (wildcards and default values, feel free to skip)

Most force fields have default rules defining angles, dihedrals, and impropers between atoms containing wildcards (such as @atom:* in the excerpt below). In moltemplate, the @atom:* symbol is used to match any atom type. Rules containing this symbol usually appear at the beginning of the list. Additional rules are added to override these default rules with explicit atom types. Consider the following excerpt from the gaff.lt file

    @improper:X-X-c-o @atom:* @atom:* @atom:c @atom:o
    @improper:c2-hc-c-o @atom:c2 @atom:hc @atom:c @atom:o

If you have a molecule containing a c atom bonded to c2, hc, o, then at least two different improper interactions will initially be generated by the "lttree.py" python program (which "moltemplate.sh" invokes). Then "moltemplate.sh" will look through the lines in the "Data Impropers" file (which was automatically generated by lttree.py) and will delete the lines with redundant interactions between those same 4 atoms, (giving preference to lines that occur later in that file). This means, that the final .DATA file generated by moltemplate.sh should only contain one improper interaction between these 4 atoms in this example. (I am assuming that you are using the "gaff_imp.py" file here.)

jewettaij commented 4 years ago

It's possible I am wrong. A long time ago a couple people looked at this code and (for the molecules they were considering) it seemed to behave correctly.

Request:

Please post a simple example LT file for a molecule with a relatively small number of atoms that currently causes moltemplate to create too many improper interactions. Let me know where the improper interactions are in the data file that moltemplate.sh creates. I will test it with the existing version of moltemplate, and also the "gaff_impropers.txt" file that you sent me.

Thanks for letting me know about this.

bhargavchava97 commented 4 years ago

Hello Andrew,

You can try the LT files attached below. With the current version of gaff_imp.py, the final improper count is around 40 while the total number of sp2 carbons is 22. I think the issue is with the sorting of atoms in gaff_imp.py where only the first and last atoms were sorted but the second atom (atom1) is left unsorted. So, same sets of 4 atoms were output multiple times in different orders. The gaff_improper code that I have attached sorts atom0, atom1, and atom3 keeping atom2 as the central atom and it generated 22 impropers. You can check the impropers section of system.data file. Also, I have compared the parameters and topology generated by antechamber through acpype with the output from moltemplate. Bonds, angles, and dihedrals matched in both the outputs but the improper count is 22 (which is equal to the number of sp2 carbons) in antechamber and 40 in moltemplate.

Regards Bhargav

On Wed, Apr 1, 2020 at 6:35 PM Andrew Jewett notifications@github.com wrote:

It's possible I am wrong. A long time ago a couple people looked at this code and (for the molecules they were considering) it seemed to behave correctly. Request:

Please post a simple example LT file for a molecule with a relatively small number of atoms that currently causes moltemplate to create too many improper interactions. Let me know where the improper interactions are in the data file that moltemplate.sh creates. I will test it with the existing version of moltemplate, and also the "gaff_impropers.txt" file that you sent me.

Thanks for letting me know about your concern.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jewettaij/moltemplate/issues/32#issuecomment-607523224, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANZTMVR5LJEVDVRLIJTGXOLRKO6RNANCNFSM4LWIENFA .

jewettaij commented 4 years ago

Hi Bhargav

It seems like you did not include the LT files in your message. Can you post it again? (Small files are better, but I will accept any LT file.)

Hello Andrew, You can try the LT files attached below. With the current version of gaff_imp.py, the final improper count is around 40 while the total number of sp2 carbons is 22. I think the issue is with the sorting of atoms in gaff_imp.py where only the first and last atoms were sorted but the second atom (atom1) is left unsorted. So, same sets of 4 atoms were output multiple times in different orders.

That was intentional. At the time, I thought I had good reasons for generating multiple improper interactions for the same 4 atoms (see "Boring Details" below). But if the resulting topology file conflicts with antechamber, I should definitely reexamine my decision. Thanks for comparing with Antechamber!

The gaff_improper code that I have attached sorts atom0, atom1, and atom3 keeping atom2 as the central atom and it generated 22 impropers. You can check the impropers section of system.data file.

Also, I have compared the parameters and topology generated by antechamber through acpype with the output from moltemplate.

Thank you! This is very helpful. I have never used antechamber. (I am also curious whether the magnitude of the improper interactions created by Antechamber and simulated in AMBER is the same.)

Bonds, angles, and dihedrals matched in both the outputs but the improper count is 22 (which is equal to the number of sp2 carbons) in antechamber and 40 in moltemplate. Regards Bhargav

My experience in the past has taught me that impropers are really tricky. It could take several days of my time to understand this and fix this. Right now I have a funding deadline due in 3 weeks. I won't have time to devote to understanding this issue until then. But I am really grateful for your post. This will get resolved in a few weeks.

Old simulation results:

If you have collected simulation data on systems built with the current version of moltemplate, extra improper interactions hopefully should not have had a big effect on the outcome of a simulation. If my understanding is correct, the flexibility of a molecule in AMBER is determined (mostly) by the properties of the bonds, angles, and (especially) dihedrals. The improper interactions are only used (in AMBER at least) to enforce planarity.

Anyway, I will revisit this, thanks to your suggestions. I'm grateful for your post.

Boring details (feel free to skip)

Technically speaking, it is not sufficient to get the correct number of improper interactions. Ideally we also want to make sure that the order that the atoms appear in the improper interactions is correct (by comparing with what Antechamber generated, if possible).

Why I generate redundant improper interactions

I generate multiple improper interactions whenever 2 or 3 of the atoms have the same type. Consider the ammonia (NH3) example above. Whenever you change the order of the atoms in an improper interaction, it changes the improper torsion angle: This consequently, changes the energy and the forces acting on the atoms (which are functions of this angle). Unable to know which atom order is correct, the "gaff_imp.py" file currently generates improper interactions for each equivalent atom order. (It generates 3 different improper interactions in the NH3 example above.) I thought this was what Antechamber was doing, but it looks like I was wrong. Otherwise, I will have to to guess the order of the atoms in the interaction and hope that I choose the same order that antechamber did (which is probably not documented). If I guess the order incorrectly, then AMBER and LAMMPS will calculate a slightly different energy for the exact same structure. This slight difference should not effect the validity of the simulation results (but it is annoying).

Either way, it is important to generate the correct number of improper interactions since this matters far more than the atom order. If I use your gaff_improper.txt file, then perhaps we can get this correct.

Thanks again -Andrew

… On Wed, Apr 1, 2020 at 6:35 PM Andrew Jewett @.***> wrote: It's possible I am wrong. A long time ago a couple people looked at this code and (for the molecules they were considering) it seemed to behave correctly. Request: Please post a simple example LT file for a molecule with a relatively small number of atoms that currently causes moltemplate to create too many improper interactions. Let me know where the improper interactions are in the data file that moltemplate.sh creates. I will test it with the existing version of moltemplate, and also the "gaff_impropers.txt" file that you sent me. Thanks for letting me know about your concern. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#32 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANZTMVR5LJEVDVRLIJTGXOLRKO6RNANCNFSM4LWIENFA .

bhargavchava97 commented 4 years ago

I have replied through email earlier as I was not able to attach LT files in GitHub message. I am attaching the ZIP file containing the LT files and also the antechamber output from acpype. antechamber_output.zip

moltemlate_files.zip

jewettaij commented 4 years ago

Thank you very much for your files. This is a pretty useful example!

Without somebody comparing the moltemplate output with antechamber, there is no way these files could ever work. For your own sake, I'm glad you were careful enough to check this. And I'm even happier that you took the time to contact me. Thank you!

(This makes me wonder about the simulation results other users are getting...)

In this example, the number of impropers generated by moltemplate and antechamber is still not the same, even after replacing "gaff_imp.py" with the contents of your "gaff_impropers.txt" file.

FRCMOD files

It looks like the problem is much worse than I thought. Moltemplate is not only creating redundant improper interactions. The more serious problem is that it is also failing to create some of the improper interactions that antechamber is creating. The problem appears to be that the file containing GAFF force field rules that moltemplate uses ("gaff.lt") is incomplete. That file does not contain definitions for "c2-c3-c2-ha", "c2-ha-c2-ha" or "ca-ca-ca-oh" that your molecule. (These improper interactions are not defined in either the "gaff.lt" file that moltemplate uses, or the original AMBER "gaff.dat" file from which it was created.)

In this example, I was able to find the definitions for the missing impropers in your "DABPA_AC.frcmod" file (included with the antechamber results). It contains definitions for these missing improper interactions that moltemplate did not create.

I am not familiar with Antechamber. I have one question for you. (If you can answer it, it will help a great deal.)

Question:

Did you run Antechamber in a different way than you normally do to let it know about the missing information? Or did Antechamber figure out how to generate these extra improper interactions by default (without any extra user input)? In other words, how did you tell antechamber where to find this extra information? (Did you supply the "DABPA_AC.frcmod" file yourself? If so, did you create it, or download it somewhere?)

If we can figure out where antechamber is getting this information, we can convert it into an LT file and use it along with the "gaff.lt" file. If these FRCMOD files are manually supplied by users, then I can provide instructions explaining how users can convert their FRCMOD files into LT format.

Instructions for converting FRCMOD files

Today in response to the files you sent, I created a new script named "amberfrcmod2lt.sh" which should (hopefully) help solve this problem. It converts FRCMOD files to LT format. You an access it by downloading the newest version of moltemplate from git:

git clone https://github.com/jewettaij/moltemplate ~/moltemplate
# or if already downloaded, use "pushd ~/moltemlate; git pull; popd"

and then use it this way

cp ~/moltemplate/moltemplate/forcefields/convert_AMBER_files_to_LT_files/* .
# ... or update your PATH environment variable to include this directory
# Then convert the FRCMOD file this way:
./amberfrcmod2lt.sh DABPA_AC.frcmod GAFF > DABPA_frcmod.lt

Then insert the following line in your DABPA.lt file:

import DABPA_frcmod.lt

Note: Insert this line after the "import gaff.lt" line, but before the molecule definition.

(Also: If there are any temporary files lying around, you might want to delete them.)

Fixing these kinds of issues can be a slow and laborious process. At least I think we made some headway today. I may have to wait a week or two before I reply again, but I will eventually get this fixed if you can answer my questions.

Thanks again! -Andrew

jewettaij commented 4 years ago

files_for_bhargavchava97_2020-4-03.tar.gz

I have attached the files necessary to generate all of the impropers in your molecule. (The "DABPA_frcmod.lt" file was generated using the procedure above.)

jewettaij commented 4 years ago

Summary

If we use the new version of the "gaff_imp.py" file that you supplied ("gaff_impropers.txt") and use the "DABPA_frcmod.lt" file that I created using the procedure above, then it is possible to get complete agreement between Antechamber and Moltemplate. The only question that remains, is how did you obtain the "DABPA_AC.frcmod" file? (For details, see my question above.) After we resolve that, we can close this issue.

Thank you very much again! -Andrew

bhargavchava97 commented 4 years ago

I had forgotten to tell you about the frcmod issue. With the standard gaff.lt file one may not be able to generate the complete set of improper. So, I had edited the gaff.lt file and added the custom improper obtained from frcmod.

Regarding hoe I got the frcmod: It is created by antechamber automatically. Antechamber has a similarity index based system (parmchk2) using which it will generate custom impropers based on the most similar improper it can find in gaff.data. In the end all such custom bond/angle/dihedral/impropers will be written to the frcmod file. So, in order to form a complete parameter set, one needs the frcmod file from antechamber.

jewettaij commented 4 years ago

Questions: 1) Would you like to submit a formal github pull request so that you are included in the list of moltemplate contributors? (I've already included your gaff_impropers.txt file in the most recent commit. So if you want to contribute, then make some trivial change to the "gaff_imp.py" file, such as adding a blank line, and I will approve the pull request along with a note linking to the commit where I added your changes.)

2) Do I have permission to include this great example with the moltemplate examples distributed on github?

3) README file.
It sounds as though AMBER GAFF users will often need to use Antechamber and Parmchk2. Since you are familiar with this process, if you would like to write a paragraph (or a short README file) explaining how you generated the atom types, partial charges, and the "DABPA_AC.frcmod" file for this molecule, I'm sure that would be helpful for other users. You don't have to go into detail. (Even if you just provide a few links to some AmberTools tutorials that you used, it would be helpful.) I can edit your README file to explain how to convert FRCMOD files into LT format.

I hesitate to ask for this, since you have already contributed a lot of your time.

Andrew

bhargavchava97 commented 4 years ago

Sure, you can use this example as a moltemplate example. I will write a short description of how to obtain the parameters and frcmod files and will send you soon. Also, I have submitted a pull request by changing "gaff_imp.py".

Regards Bhargav

bhargavchava97 commented 4 years ago

Also, I have forgotten to ask you about an issue regarding "cleanup_moltemplate.sh" file. While cleaning up the data file, cleanup_moltemplate.sh has no way to use files like "gaff_imp.py". So, I guess it using the default "nbody_Impropers.py" in line 77 and changing the correct original order created by "gaff_imp.py".

jewettaij commented 4 years ago

Sure, you can use this example as a moltemplate example. I will write a short description of how to obtain the parameters and frcmod files and will send you soon. Also, I have submitted a pull request by changing "gaff_imp.py".

→ Awesome ←

jewettaij commented 4 years ago

Also, I have forgotten to ask you about an issue regarding "cleanup_moltemplate.sh" file. While cleaning up the data file, cleanup_moltemplate.sh has no way to use files like "gaff_imp.py". So, I guess it using the default "nbody_Impropers.py" in line 77 and changing the correct original order created by "gaff_imp.py".

cleanup_moltemplate.sh does not use either gaff_imp.py or nbody_Impropers.py. It is a very crude program. It reads an existing data file (which contains a list of all of the improper interactions determined by moltemplate.sh using gaff.lt and gaff_imp.py, for example). Then it simply gets rid of all of the unused atom types (and unused bond, angle, dihedral, and improper types as well) from that data file. Then it creates a new data file (and input scripts and other auxiliary files) with those unused atom types (and improper types) deleted, and the remaining types renumbered.

jewettaij commented 4 years ago

...These remaining impropers will be the same impropers that moltemplate generated the first time you ran it (generated using "gaff_imp.py" in this case, since you are using gaff.lt).

bhargavchava97 commented 4 years ago

But I have actually seen the order of 2nd and 3rd atom interchanged for few impropers after "clean_moltemplate.sh". So, I thought this was due to the line "77 - moltemplate.sh system.lt" which runs moltemplate.sh on the new system.lt file (with all the topology pre-written). I thought it was using "nbody_Impropers.py" because the 2nd and 3rd atoms were sorted after the cleaning which is what "nbody_Impropers.py" does. When I manually extracted the new system.lt file from the temperarory directory and ran moltemplate.sh with gaff_imp.py, the problem went away. I could be wrong, but this was my experience.

jewettaij commented 4 years ago

But I have actually seen the order of 2nd and 3rd atom interchanged for few impropers after "clean_moltemplate.sh". So, I thought this was due to the line "77 - moltemplate.sh system.lt" which runs moltemplate.sh on the new system.lt file (with all the topology pre-written). I thought it was using "nbody_Impropers.py" because the 2nd and 3rd atoms were sorted after the cleaning which is what "nbody_Impropers.py" does. When I manually extracted the new system.lt file from the temperarory directory and ran moltemplate.sh with gaff_imp.py, the problem went away. I could be wrong, but this was my experience.

I'm impressed that you have investigated the code this carefully.

~On line 48 of "cleanup_moltemplate.sh", the "ltemplify.py" program is used to create a new LT file (also named "system.lt"). The new "system.lt" file should not contain a section named "Data Impropers By Type" (which is found in "gaff.lt"). That's the name of section that would contain rules for creating new improper interactions (using the rules in either "nbody_Impropers.py" or "gaff_imp.py", for example). Instead, the new "system.lt" file should have a section named "Data Impropers", which has all of the improper interactions listed explicitly.~

However, there could be other reasons why the atom order is changing, and other possible errors. The "cleanup_moltemplate.sh" file is important, so if it has bugs, I'd love to know about it.

~Please send me an example where you see this behavior. (Did you notice this in the DABPA.lt example?)~

jewettaij commented 4 years ago

Nevermind. I just tried it on the DABPA example, and you are right. (dammit) Thanks for reporting it. I will take a look now...

jewettaij commented 4 years ago

Okay. I think the most recent commit I just posted fixes this.

Thank you for again for your careful reporting! This is a disturbing and potentially serious bug that causes "cleanup_moltemplate.sh" to alter the order of improper atoms for all force fields, (including GAFF, GAFF2, OPLSAA, and COMPASS). It is unlikely but also possible it could have caused improper interactions to be incorrectly identified as redundant and deleted.

I wish everybody else was as careful as you are. This bug has probably been there for ~4 years.

Boring Details (feel free to skip)

However, while investigating this bug, I noticed other more subtle problems in "moltemplate.sh" which will require a few days to address. This problem will not effect the vast majority of users, but it bothers me. It is messy and it will take a few days to fix, so I probably won't attempt it until a few weeks have past when I am less busy. I will leave a comment here to remind myself later.

Note-To-Self (Please ignore this)

The problem is that the "nbody_reorder_atoms.py" script is being invoked (on line 1496 of "moltemplate.sh") with the wrong SUBGRAPH_SCRIPT file. I think one reasonable solution is to append the name of the symmetry file (eg "gaff_imp.py") to the "Data Impropers" section name (eg -> "Data Impropers (gaff_imp)"), and tell moltemplate.sh to set SUBGRAPH_SCRIPT_IMPROPERS to this file when it encounters it mentioned there. Also: When moltemplate.sh creates a DATA file (eg "system.data") mention this symmetry file (eg "gaff_imp.py") it in a comment in the system.data file that moltemplate.sh creates. Finally modify ltemplify.py to look out for this comment and put it back in the LT file, accordingly. This will require several modifications to both moltemplate.sh and ltemplify.py.

jewettaij commented 4 years ago

I'm closing this issue now. The last few days have been a very productive debugging session. Thank you for being so careful, reading and modifying my messy code, reading my long-winded responses, preparing the DABPA example and allowing me to post it. This has been great.

And please let me know if you have any more observations. Take care.