mosdef-hub / foyer

A package for atom-typing as well as applying and disseminating forcefields
https://foyer.mosdef.org
MIT License
121 stars 78 forks source link

Methods to assign charge during or after atomtyping #122

Open mattwthompson opened 7 years ago

mattwthompson commented 7 years ago

The workflow we've been using with OPLS doesn't work for most forcefields. In Amber, for example, non-bonded VdW and bonded interactions are specified through atomtyping but partial charges are assigned after atomtyping using antechamber, which uses AM1-BCC. In other words, a given atomtype may have different partial charges if it's used in different molecules. (One such example is n-chloroalkanes, in which the chlorines have different charges than hydrogens but only one atomtype is used for the carbon, which can result in non-zero total charges on the molecule).

The major limitation that comes to mind is that AM1-BCC is only to be used on small molecules. A large system would need to be broken down into residues, molecules, or characteristic moieties, passed to antechamber or an equivalent, and then replicated back to the whole system.

mattwthompson commented 7 years ago

So,

antechamber is installable in conda:

conda install -c omnia ambermini

And isn't too hard to run. Here, I run it on a propylene carbonate molecule and store the charges in an output mol2 file.

$ time antechamber -i prcarb.pdb -fi pdb -o prcarb.mol2 -fo mol2 -c bcc -nc 0
Total number of electrons: 54; net charge: 0

Running: //anaconda/bin/../bin/sqm -O -i sqm.in -o sqm.out

real    0m0.297s
user    0m0.264s
sys 0m0.025s

And we can obviously grab these charges in parmed:

>>> import parmed as pmd
>>> pc = pmd.load_file('prcarb.mol2', structure=True)
>>> [atom.charge for atom in pc.atoms]
[0.1121, 0.0797, 0.0797, -0.4069, 0.8487, -0.5525, -0.4059, 0.0964, 0.0857, -0.1111, 0.0587, 0.0587, 0.0587]

So it's easy enough to generate these partial charges and store them. But given AM1's inability to scale past small systems, this would need to be run on single prototype molecules, which, as far as I know, isn't really how atomtyping works right now.

One idea I have (say I have an existing gaff.xml file with correct information and partial charges of 0 in <NonBondedForce>):

  1. Atomtype the system as normal so the rest of the topology has bonded and LJ terms
  2. Pass a list of residue names to a function that slices out a single residue of each type
  3. Run an antechamber wrapper on this slice
  4. Grab the generated partial charges and map them back onto each copy of this residue

This works in my head and should work for a box of small molecules (or a more complex system in which only its small molecules are passed to this function). This certainly wouldn't work on big slabs and might not work for polymers unless each monomer is accounted for as a single residue.

Any thoughts? Worth pursuing this approach?

davidlmobley commented 7 years ago

The basic procedure that one needs to apply here is conceptually simple but hard to implement in a general way. It's most established for proteins and nucleic acids (and is where the charges in those force fields typically come from) so you may want to see what's been done there. Basically, you "slice" (to use your term) a piece of the system out, cap it in a way so that the caps don't affect its electronic structure much, compute charges, remove the caps, distribute residual charge from the caps back across the slice so that it still has the proper formal charge, then link it back into the larger chain.

It's conceptually simple. The main problems, as I understand it, relate to finding a general way to pick where to slice so that you end up with electronically relatively decoupled regions so that your charges aren't (a) artificial and (b) totally dependent on where you picked to slice.

If you have an OpenEye license, I think OpenEye is working on a general approach for this it might be possible to try out.

I'm also happy to discuss briefly by phone if you guys want.

mattwthompson commented 7 years ago

Thanks for your comment, looks like I wasn't too far off the mark. Could you share some references that demonstrate this? I looked around a bit but I'm not familiar with the biophysical literature and I only found its use in small molecules.

I'm not working with any polymers/biopolymers at the moment, so I was thinking about doing a simpler implementation that was limited to small molecules. Obviously we would want to support arbitrarily large molecules in the futures but this might be useful as a first step (and it would be more useful for my research in the immediate future, so I can start using GAFF and forcefields that rely on setting partial charges).

Thoughts guys? Want to wait until @ctk3b is back to chat or is this worth implementing now?

chrisiacovella commented 7 years ago

Check out Automated Topology Builder (and some of their associated papers linked on the website). They go into detail with regards to the different methods for chopping up the system and different ways of getting charges.

https://atb.uq.edu.au

A workflow that calls another tool for getting the charges would of course be useful, but given the way mbuild/foyer are currently designed/implemented, dumping these parameters and their associated charges to a custom, reusable force field file would probably be the way to go. We can create a catalog of molecule specific GAFF forcefield files.

This might actually be a good way to go, as I’ve read papers that just say “we got charges from ATB” but they don’t list the charges or list which scheme from ATB they used, so the work is not reproducible. Having these force field files would remove any ambiguity as to which parameters/charges were actually used.

On Jul 3, 2017, at 9:05 AM, Matt Thompson notifications@github.com<mailto:notifications@github.com> wrote:

Thanks for your comment, looks like I wasn't too far off the mark. Could you share some references that demonstrate this? I looked around a bit but I'm not familiar with the biophysical literature and I only found its use in small molecules.

I'm not working with any polymers/biopolymers at the moment, so I was thinking about doing a simpler implementation that was limited to small molecules. Obviously we would want to support arbitrarily large molecules in the futures but this might be useful as a first step (and it would be more useful for my research in the immediate future, so I can start using GAFF and forcefields that rely on setting partial charges).

Thoughts guys? Want to wait until @ctk3bhttps://github.com/ctk3b is back to chat or is this worth implementing now?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/mosdef-hub/foyer/issues/122#issuecomment-312684058, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABW4q8m4uiTKmu4XD573qw6ZvcbGFtzbks5sKRFagaJpZM4NfX7E.

davidlmobley commented 7 years ago

Thanks for the note on ATB - I hadn't realized they discussed these issues on there so I'll have to look.

Let me know if I can be helpful at all.

Regarding having files versus having charging procedures -- I have always felt knowing the origin of the files is even more important than having the files themselves. The worst case scenario is, as you describe it, papers which just say "We got charges from ATB" without saying which/how, but it's still less than ideal to say, "We got charges from ATB and here are our files," as one still doesn't know where they came from. It's sort of like buying a "diamond ring" on a street corner. Who knows what it really is? I'd much rather have, "Here are my files, and here's the algorithm I used to generate them."

If you do implement something along these lines, can you try and do it in a way which is not Foyer-specific? We'll ultimately need something for blocking too, and it would be nice to be able to reuse rather than rewrite.

chrisiacovella commented 7 years ago

The JCTC paper about ATB has a lot of good details:

http://pubs.acs.org/doi/abs/10.1021/ct200196m

As does their most recent paper on improving it (I don’t have access to the full journal, but NIH has a mirror): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3590896/

I agree with you, having the files and information about how they were generated is probably a better way to ensure reproducibility than just the files alone. However, we also need to make sure we are including what structure was given to the code to calculate the charges. That is, did we provide the whole molecule or just a smaller fragment and what was the fragment?

We just had issues trying to reproduce someone’s charges they got from ATB; at first I thought it might be because we were using v2 not v1.2 (v2 has improved methods), but upon digging deeper, discovered they only fed ATB the heagroup (with very short tails capped off), making the (reasonable) assumption that the tail beads have the same charges (which is certainly useful when the goal is to vary tail length)…but of course, it would have been nicer just to have those details to begin with (which is certainly where coming up with some best practice checklists would be great).

We can automatically have all this information if we encode this all into a well defined python script/workflow where the structure is defined in say, mbuild, with that structure passed to a code to calculate the charges, and everything gathered back and dumped to a composed file, we would have all the information we need to reproduce the work (assuming this script is included with the publication or in a linked git repo).

The secondary issue with this (and something we talked about with you a few months ago) is that this would introduce a whole bunch of new atom-types for any case where the charge is different. However, if we are generating this in an automated way AND using SMARTS to define the usage, this is less of a significant issue than if we did this all by hand or tried to include these parameters in some general force field library, rather than just a custom file for a specific molecule (recent changes to mbuild/foyer allow you to atom-type on individual components, not just the whole library, although I’m sure these have been pushed to the main repo yet).

On Jul 4, 2017, at 9:28 AM, David L. Mobley notifications@github.com wrote:

Thanks for the note on ATB - I hadn't realized they discussed these issues on there so I'll have to look.

Let me know if I can be helpful at all.

Regarding having files versus having charging procedures -- I have always felt knowing the origin of the files is even more important than having the files themselves. The worst case scenario is, as you describe it, papers which just say "We got charges from ATB" without saying which/how, but it's still less than ideal to say, "We got charges from ATB and here are our files," as one still doesn't know where they came from. It's sort of like buying a "diamond ring" on a street corner. Who knows what it really is? I'd much rather have, "Here are my files, and here's the algorithm I used to generate them."

If you do implement something along these lines, can you try and do it in a way which is not Foyer-specific? We'll ultimately need something for blocking too, and it would be nice to be able to reuse rather than rewrite.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/mosdef-hub/foyer/issues/122#issuecomment-312915454, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABW4q881mBSNW8126mP5daRsmykc1qyBks5sKmhEgaJpZM4NfX7E.

mattwthompson commented 7 years ago

Alright, to sum up some of these discussions, I'll be taking a stab at building another package outside of foyer that calls a QM method to assign charge. I'll need to figure out a handful of details along the way (blocking being the biggest hurdle) and I'll tag people in issues elsewhere if I need help.

I agree that documenting the underly algorithms, not necessarily just spitting out files, is the best approach for reproducibility and openness.

If you do implement something along these lines, can you try and do it in a way which is not Foyer-specific?

My vision is for it to take and return parmed structures, which fits right into the foyer workflow but should also be generally useful to people who have other workflows.

davidlmobley commented 7 years ago

Excellent. That sounds great. LMK if it would be helpful to have a call with us to discuss strategies for this. Otherwise, please just tag me as you get going as I'll be an interested spectator and potentially interested in using as well.