United-atom Types - Githubissues

ramess101 commented 7 years ago

United-atom types are quite popular in "data driven" force fields. For hydrocarbons the "virtual site" is located on the carbon atoms while the hydrogens are not accounted for explicitly. Hydrogens attached to oxygen, nitrogen, etc. are typically still included explicitly which means that for non-hydrocarbons there is a mix of explicit and implicit hydrogens. This might complicate matters since you cannot just ignore all hydrogens. Fortunately, in the immediate future (i.e. the next 2 years) the NIST force field will be focusing on hydrocarbons so we just need some way for smarty to ignore the hydrogens attached to carbon.

davidlmobley commented 7 years ago

@ramess101 -- I don't understand, what is the purpose of having a virtual site which is located ON the carbon atoms? Or are you just saying we want a single interaction site on the carbon, where the hydrogens are not used?

@jchodera and @bannanc - thoughts on how we would want to do this with SMARTS/SMIRKS? Perhaps just a SMIRNOFF tag for "united atom" forcefields which allows specific SMIRKS patterns to be specified for atoms which are going to be dropped?

Initially I thought that we should simply say that particles which do not get assigned LJ parameters get dropped, but this would be difficult given the hierarchical nature of the force fields (e.g. it would be difficult to ensure that a generic for a particular element doesn't cover atoms which should be treated as united atoms).

This would allow us to approach the problem in several ways; for example, we might say that [#1:1] always gets dropped except when it is [#1:1]~[#8], or that [#1:1] is always retained but [#1:1]~[#6] is always dropped, depending on whether a particular force field wants to treat hydrogens as TYPICALLY UA or typically retained.

ramess101 commented 7 years ago

Sorry, I was just trying to be consistent with your terminology of "virtual site" but clearly that created more confusion than clarity. Yes, the united-atom model just has a single interaction site on the carbon whereas the hydrogens have no interaction site.

davidlmobley commented 7 years ago

Sorry, @ramess101 - I should clarify "virtual site" then. Say we are simulating N particles, so we are tracking the coordinates of those particles. A virtual site is one way of handling another interaction site; for example, we could say that there is additional Lennard-Jones or charge site that is placed in a certain position/orientation relative to every carbonyl group, or some such. If this is done in a way that it does not result in having to simulate an additional particle, but is instead just defined relative to the positions of the other atoms we ARE simulating, then it is a "virtual site". Does that help clarify?

ramess101 commented 7 years ago

@davidlmobley Yes, that makes sense. Thanks!

bannanc commented 7 years ago

Are united-atoms always the same combination of atoms or are they something we're going to learn?

I'm not very familiar with united atoms so I'm trying to understand when they'd be applied. From the call today it sounds like you're assigning parameters (specifically Lennard Jones, non-bonding?) to a group of atoms rather than a single atom. In the SMIRNOFF format we don't really have

particles which do not get assigned LJ parameters

as those atoms would be assigned the generic non-bonding SMIRKS "[*:1]". For consistency throughout, I would suggest we keep the idea that no particle/group should be assigned the generic parameter.

So far I've thought about two options for assign UA parameters.

We could assign null parameters to atoms that are included inside a united-atom such as "[#1:1]-[#6](-[#1])-[#1]" for a hydrogen that is in a methyl group. I think this is along the lines @davidlmobley was thinking above, maybe?

Hydrogens attached to oxygen, nitrogen, etc. are typically still included explicitly which means that for non-hydrocarbons there is a mix of explicit and implicit hydrogens.

If this is a hard and fast rule then you could imagine doing no.1 more genericly assigning null parameters to any hydrogen bound to a carbon (i.e. "[#1:1]-[#6]").

You could make a sub-type of non-bonding parameters. (I'm imagining something similar to proper vs. improper in the case of torsions). Where you could have UA non-bonding parameters where a SMIRKS pattern is used to assign parameters to the full group of atoms such as "[#1:1]-[#6:2](-[#1:3])-[#1:4]" where all 4 atoms in a methyl group are grouped together and assigned a single parameter.

ramess101 commented 7 years ago

The united-atom types have historically been assigned with chemical intuition. For example, distinguishing between CH3, CH2, CH2=, CH, CH=, etc. UA types make sense based on the notion of group contributions and what makes two groups different. Although I am not sure that learning the appropriate UA types is in the future plans it is certainly an interesting idea. For example, we could learn which hydrogens are not necessary to have explicitly in the model and which ones are required to predict a given property. My plan is to stick with the traditional approach of pre-assigning the UA types and then use Bayesian approaches to determine which non-bonded potentials are required to predict the desired properties.

jchodera commented 7 years ago

@bannanc's suggestions are a good starting point here!

We first have to decide, however, which of two potential representations we are going to use:

Option 1. All atoms are explicitly defined (and used in SMIRKS matching), but fewer interaction sites (on atoms or virtual sites) are defined than the number of atoms in order to make things fast. However, we would still need some way of keeping groups of atoms rigid or applying valence terms so that we always have a fully atomistic representation during the simulation.
Option 2. A reduced representation is used throughout, where each site may represent a collection of atoms. This may prove incompatible with a SMIRKS-based representation if we start with a reduced representation, but it's possible that we could start with a fully atomistic representation and then add an explicit reduction step where the system is compacted in a way that only a subset of atoms or sites---those with interactions defined---are put into the actual System object that is simulated.

I tend to favor approach 1, though I realize that may differ from how things are traditionally done. If we're OK with that, we can discuss the next steps that @bannanc has already proposed some ideas for---namely, how we tell SMIRNOFF to skip checks that ensure all atoms have nonbonded parameters when only a subset of sites will have interactions defined.

davidlmobley commented 7 years ago

@jchodera - do you have any thoughts on the likely relative performance of the two approaches?

I agree that Option 2 is likely incompatible with a SMIRKS-based representation if we START with a reduced representation, though there could be an explicit reduction step. Probably this would require some sort of custom SMIRKS which would define groups of atoms to be replaced by an individual atom type.

I find Option 1 conceptually more appealing because it recognizes the reality that there are still atoms involved and would make it possible to go back to an atomistic representation if we wanted. However, I worry that it might become slow to run, as well as cumbersome to build in the ways of keeping groups rigid, etc. On the other hand, Option 2 also sounds cumbersome to fit into the SMIRKS framework, so that still leaves me leaning towards Option 1. @ramess101 , @bannanc ?

jchodera commented 7 years ago

I don't think the performance of Option 1 will be very much degraded compared to Option 2, and it does have big advantages that make it easier to compute various properties directly from the simulations.

The main issue is that Option 1 requires more valence parameters than Option 2, though this might be mitigated if groups of atoms could be rigidly constrained.

davidlmobley commented 7 years ago

OK, I'm still more inclined towards Option 1 then. It's an excellent point about the analysis -- this would make analysis via "standard tools" work well, whereas other approaches will require a mapping procedure before analysis can be done.

jchodera commented 7 years ago

In that case, we're back to the question of how to tell SMIRNOFF to assign nonbonded types to real or virtual sites for a subset of atoms, and ignore the fact that some atoms won't have nonbonded types.

There are several options on the table (slightly modified from @bannanc's suggestions):

Option A: Explicitly match atoms that are subsumed into parent atoms, and assign null parameters. For hydrogens in a united-atom methyl group, this might be "[#1:1]-#6-[#1]".
Option B: Create a special tag to mark groups of atoms that will be represented by a single (real or virtual) site, and so missing nonbonded types will be ignored. For example, for a methyl group, this could be [#1:2]-[#6:1](-[#1:3])-[#1:4], where all 4 atoms in a methyl group are grouped together because we will separately assign either a real or virtual site in this group a parameter.
Option C: Same as B, but also let this tag assign real or virtual sites a nonbonded parameter. This could either be a modified form of <NonbondedForce/> tags or a new type of <UnitedAtomNonbondedForce/> tag.

bannanc commented 7 years ago

Sorry I didn't respond to this sooner.

I've been thinking about how we might want to use these in the long run. I'm assuming that eventually we might want to learn which atoms are grouped together. One way of checking learned types is that all atoms are labeled. So everything in your system should have a non bonding force assigned to it.

That was the thought process that initially led to my two thoughts about labeling above. I don't think we want options where "missing nonbonded types will be ignored." However, the SMIRKS pattern with multiple indices in option B/C could still be used to explicitly label the hydrogens, even though the group is sharing an interaction site.

Given that we're want options for real or virtual sites I think it makes more sense to keep the groups of atoms labeled together (I.e. not option A). Otherwise a virtual site would have to be defined with one of the explicit atoms.

I'm not sure I understand the difference between options B and C wouldn't labeling the group have to " assign[ing] real or virtual sites a nonbonded parameter."

On Thu, Mar 23, 2017 at 14:19 John Chodera notifications@github.com wrote:

In that case, we're back to the question of how to tell SMIRNOFF to assign nonbonded types to real or virtual sites for a subset of atoms, and ignore the fact that some atoms won't have nonbonded types.

There are several options on the table (slightly modified from @bannanc https://github.com/bannanc's suggestions):

Option A: Explicitly match atoms that are subsumed into parent atoms, and assign null parameters. For hydrogens in a united-atom methyl group, this might be "[#1 https://github.com/open-forcefield-group/smarty/pull/1:1]-#6-[#1 https://github.com/open-forcefield-group/smarty/pull/1]".

Option B: Create a special tag to mark groups of atoms that will be represented by a single (real or virtual) site, and so missing nonbonded types will be ignored. For example, for a methyl group, this could be [#1:2]-#6:1-[#1:4], where all 4 atoms in a methyl group are grouped together because we will separately assign either a real or virtual site in this group a parameter.

Option C: Same as B, but also let this tag assign real or virtual sites a nonbonded parameter. This could either be a modified form of tags or a new type of tag.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-forcefield-group/smarty/issues/220#issuecomment-288862524, or mute the thread https://github.com/notifications/unsubscribe-auth/ALagHXlRLZFhRXUrM2hnIjLIcr_hUu61ks5rouEEgaJpZM4Mbjtd .

--

Caitlin C. Bannan Ph.D. Candidate Mobley Group Chemistry Department at UC Irvine bannanc@uci.edu

jchodera commented 7 years ago

However, the SMIRKS pattern with multiple indices in option B/C could still be used to explicitly label the hydrogens, even though the group is sharing an interaction site.

Since "labeling" atoms that are subsumed into a single interaction site isn't something that would appear in a System object, I'm not quite sure what you mean by labeling other than an internal bookkeeping device to figure out when to raise an error if some atoms were accidentally omitted in defining interaction groups. Is it just a bookkeeping device? If so, I think we're on the same page.

I'm not sure I understand the difference between options B and C

Option B would not change the existing type definitions for <NonbondedForce> and virtual site definitions (yet to be finalized), but would add a new tag that would simply specify how atoms are grouped together so that type assignment checking wouldn't complain if parameters are missing.

Option C would ether modify the way we use NonbondedForce or add a new kind of NonbondedForce tag that would let you tag more than one atom when defining interaction sites.

bannanc commented 7 years ago

Is it just a bookkeeping device? If so, I think we're on the same page.

Yes, I think we're on the same page here @jchodera

Thanks for clarifying with B/C I still don't have a preference for which one of those makes more sense. @davidlmobley do you?

jchodera commented 7 years ago

I guess the biggest question is whether we want to extend NonbondedForce to allow additionally tagged atoms (2, 3, 4...) to be "subsumed" into a single interaction site, or whether we want to create a whole new tag and generator for this. I can see advantages to simply extending NonbondedForce if it doesn't break current functionality, since otherwise we need to engineer communication between different generators so the type checking doesn't complain that atoms are untyped. Alternatively, we can engineer a simpler type checking scheme that runs separately at the end of the process to flag missing types, but this involves rearchitecting things with a new idea of how to do the checking (which may not be bad, just more effort now).

We should also finalize the specification of both virtual sites and how we will assign interaction types to virtual sites.

bannanc commented 7 years ago

Assuming it doesn't break current uses it seems like it makes more sense to extend the NonbondedForce. This seems more logical to me than having multiple generators that create specify non-bonding interactions. Could we define subtypes similar to the proper and improper options for torsions.

jchodera commented 7 years ago

@davidlmobley Any more thoughts on this? I think we're leaning toward extending the NonbondedForce spec to allow multiple atoms to be subsumed into a single interaction site.

For example, to create united-atom methyls and methylenes, we would go from

<NonbondedForce coulomb14scale="0.833333" lj14scale="0.5" sigma_unit="angstroms" epsilon_unit="kilocalories_per_mole">
   <Atom smirks="[#1:1]" rmin_half="1.1870" epsilon="0.0157"/> # hydrogen
   <Atom smirks="[#6:1]" rmin_half="1.9080" epsilon="0.1094 "/> # carbon
</NonbondedForce>

to

<NonbondedForce coulomb14scale="0.833333" lj14scale="0.5" sigma_unit="angstroms" epsilon_unit="kilocalories_per_mole">
   <Atom smirks="[#1:1]" rmin_half="1.1870" epsilon="0.0157"/> # hydrogen
   <Atom smirks="[#6:1]" rmin_half="1.9080" epsilon="0.1094 "/> # carbon
   <Atom smirks="[#1:2]-[#6:1]-[#1:3]" rmin_half="1.9080" epsilon="0.1094 "/> # -CH2-
   <Atom smirks="[#1:2]-[#6:1]([#1:3])-[#1:4]" rmin_half="1.9080" epsilon="0.1094 "/> # -CH3
</NonbondedForce>

where the labeled atoms numbered greater than 1 in the <Atom/> elements in the <NonbondedForce/> element get subsumed into the atom numbered 1. This is mostly for bookkeeping, where we say that these sites should not have nonbonded parameters assigned by the more general types and should not raise exceptions in our checks.

We could also add an optional attribute that would explicitly say that this atom should be treated as a united-atom site, like type="united" or united="True" to make it harder to make accidental mistakes.

davidlmobley commented 7 years ago

Yes, @jchodera - I didn't have further thoughts on this because I like the plan.

davidlmobley commented 7 years ago

I also like the idea of having type="united" or united=True (slight preference for the latter) to avoid accidental mistakes. I think it will also make more clear that someone didn't just accidentally take, say, an angle SMIRKS pattern and drop it into the nonbonded section.

BUT - does this create any complications for the other sections, such as bonds and angles?

jchodera commented 7 years ago

We still need to solidify the spec changes.

Are you happy with the above proposal? We'll have to overhaul the parameter assignment and type checking code if so. The parameter assignment part might be more complex because we have to make sure that an atom doesn't get typed if it is covered by a more specific chemical environment that includes it as a tagged atom.
Do we want to use this time to extend NonbondedForce to include other nonbonded interactions, such as exp-6, lennard-jones-<m>-<n>, or even specify algebraic functional forms?
Is there something we should do with virtual site based united atoms at this point, or punt that until later?
Do we need to do anything with virtual bonds, angles, and torsions between virtual sites? Or is it OK to keep the atomic bonds, angles, and torsions and just reduce the number of nonbonded interaction sites? I believe OpenMM excludes the computation of interactions (at least for Lennard-Jones) between sites with epsilon=0, so we may get the full speed benefit of united atom even if we keep atomic bonds, angles, and torsions, which simplifies everything.

davidlmobley commented 7 years ago

I'm happy with the above proposed spec, but I haven't had time to think through the broader implications for what would have to change in the code, nor do I understand what this would mean for bonds, angles, and torsions, and how we would manage "virtual bonds, angles, and torsions", or if we even need such things. It seems to me we need input from people who use these types of force fields ( @ramess101 ?) on what they want to be able to support, as I don't have any recent experience with coarse grained FFs.

Is there something we should do with virtual site based united atoms at this point, or punt that until later?

It seems like we should punt until later, since we don't yet even have virtual sites for charges, which is a higher priority.

I'll have to digest the rest a bit more and respond again soon.

davidlmobley commented 7 years ago

OK, I've thought through this a bit more, @jchodera , and, to respond to your comment:

-Are you happy with the above proposal? We'll have to overhaul the parameter assignment and type checking code if so. The parameter assignment part might be more complex because we have to make sure that an atom doesn't get typed if it is covered by a more specific chemical environment that includes it as a tagged atom.

I'm happy with it. I do have questions though - for example I'm not understanding how this would interplay with the assignment of the other parameters (bonds, angles, torsions); I'm thinking we would still assign these parameters to the atoms which are subsumed into other atoms, but then we wouldn't actually be using them when running dynamics because we'd be keeping the groups rigid? Is that what you have in mind?

Do we want to use this time to extend NonbondedForce to include other nonbonded interactions, such as exp-6, lennard-jones--, or even specify algebraic functional forms? Is there something we should do with virtual site based united atoms at this point, or punt that until later?

I would punt. I'm tagging Richard on Slack though to see if he wants to weigh in on this.

Do we need to do anything with virtual bonds, angles, and torsions between virtual sites? Or is it OK to keep the atomic bonds, angles, and torsions and just reduce the number of nonbonded interaction sites?

This sort of gets at my question above. I don't really know because I have so little experience with CG force fields. I think my approach would be to keep the atomic ones (though they might need to be different in a CG FF) and reduce the number of nonbonded interaction sites. Effectively we'd be carrying dummy atoms around, like in an alchemical calculation, I think.

I believe OpenMM excludes the computation of interactions (at least for Lennard-Jones) between sites with epsilon=0, so we may get the full speed benefit of united atom even if we keep atomic bonds, angles, and torsions, which simplifies everything.

Right, though presumably computing the bonds, angles, and torsions for these extra sites will have some cost. But perhaps we can keep them rigid without too much trouble? Not sure which would be more costly though -- applying constraints or just computing the full bonded interactions for what are effectively "dummy" atoms.

I do very much like that this makes it trivial to compute properties for a full all-atom simulation even from a united atom approach.

ramess101 commented 7 years ago

Sorry for not seeing this earlier.

The general approach for united-atom models is to utilize the same bond, angle, and torsional potentials as the all-atom cases but only for the heavy atoms. In other words, for a molecule like n-butane that is represented as CH3-CH2-CH2-CH3 we would use the same C-C bond lengths (and harmonic force constants) as the all-atom C-C bonds and the same for the C-C-C angles and C-C-C-C dihedrals. That being said, sometimes an all-atom model will use slightly more precise bond lengths and angles. For example, the TraPPE-UA model uses a 0.154 nm C-C length whereas the TraPPE-EH (explicit hydrogen) model uses a 0.1535 nm C-C length.

Usually you do not waste computational time computing bonds, angles, and torsions for the extra hydrogen sites that are not included in the nonbonded calculations. But if the C-H bonds, H-C-H bends, etc. are kept rigid with little effort that is also a viable option. And, like you mentioned, this can make it quite simple to convert the united-atom model to an all-atom model.

As far as the nonbonded potentials, I do think the Lennard-Jones m-n potential is gaining a lot of attention in force field development. The Exp-6 model is much less common in the recent literature, so if I had to pick one I would say that including the LJ m-n model would be the higher priority.

As for virtual site based united-atoms, I agree that we should punt that until later.

davidlmobley commented 7 years ago

Thanks, @ramess101 . So, @jchodera - I think we agree on the path forward for now.

openforcefield / smarty

United-atom Types #220