Energy component analysis

leeping commented 10 years ago

This issue is to discuss a proposed feature of having energy component analysis in OpenMM. Peter mentioned that this would be a major architectural change in OpenMM and would take a long time to do - so it would be good for us to have a well-defined layout of what this feature is.

Energy component analysis is a way to decompose the total potential energy into physically motivated components. Most of the time, each Force object corresponds to one component (bond, angle, dihedral), but other times a Force object may contribute to multiple energy components (e.g. NonbondedForce has both Lennard-Jones and electrostatic components, and AmoebaMultipoleForce has charge-charge, charge-dipole, dipole-dipole, etc etc). The electrostatic component may be further decomposed into short-range and long-range components.

I can't think of a situation where multiple Force objects contribute to a single energy component though - can you?

The proposed user interface would be:

state = simulation.context.getState(getEnergy=True, getEnergyComponents=True)
state.getEnergyComponents()
{
    'Bonds' : 32.123 kJ/mol
    'Angles' : 45.654 kJ/mol
    'Dihedrals' : 23.432 kJ/mol
    'Lennard-Jones' : 123.321 kJ/mol
    'Coulomb' : -234.432 kJ/mol
}

If energy components are implemented, it would be very cool to have a mapping from the components to the Force object that generates them:

system.getEnergyComponentMapping()
{
    'Bonds' : system.HarmonicBondForce,
    'Angles' : system.HarmonicAngleForce,
    'Dihedrals' : system.PeriodicTorsionForce,
    'Lennard-Jones' : system.NonbondedForce,
    'Coulomb' : system.NonbondedForce
}

The use cases for having energy components are:

Reweighting (in general, recalculating the potential energy using a different System or force field): Energy component analysis allows us to detect which energy terms are changing when we modify the force field. This would allow us to rapidly evaluate perturbations to the potential energy by only activating the single Force that is changed.
Matching to QM: By matching energy components of the force field to energy components in a corresponding QM calculation, we can build force fields with less overfitting in the Lennard-Jones and electrostatic terms.
Testing: Making sure that a force field is correctly implemented by comparing with another piece of software.
The current way of doing energy components by setting a different force group for each Force is not fine-grained enough, and it is incompatible with anything else that also needs force groups, e.g. multiple timestep integrators.

@mrshirts : Levi Naden from our chat yesterday was also interested in having energy components for accelerated reweighting. It might be good for you guys to chime in as well.

swails commented 10 years ago

There are many times I wished I could do exactly this, particularly when validating and debugging my implementation of different force fields in OpenMM.

In my experience, my principle complaint is that I didn't have fine-grained support over different 'nonbonded' groups. I've wanted to separate out the vdW, electrostatic, and the exclusion rule vdW/electrostatic terms (1-4 vdW and EEL in Amber-speak). (That last part is very fine-grained, but still very helpful). I hacked together an approach through ParmEd in which I zero out each part I want to ignore, but it's not as convenient as having a way to do this at the API level.

I've thought about this a bit and realized that the ability to assign different NonbondedForce objects to different groups would solve all of my problems. If you could add 2 NonbondedForce objects (one with all vdW parameters zeroed and the other with all charges zeroed) and assign them to separate groups you'd be able to get the energies you wanted. It would be more awkward than having some kind of dict/hash/map proposed here, but it would allow even more fine-graining.

A few other comments:

I can't think of a situation where multiple Force objects contribute to a single energy component though - can you?

GBForce and NonbondedForce both contribute to electrostatic energies, although it's quite helpful to be able to separate them out. This is currently impossible since all nonbonded forces must belong to the same group.

The current way of doing energy components by setting a different force group for each Force is not fine-grained enough, and it is incompatible with anything else that also needs force groups, e.g. multiple timestep integrators.

I agree that it's not fine-grained enough right now, but I can't see why it would be incompatible with other approaches that need groups, like multiple timestep integrators. The onus would be on the programmer to implement a multi-level group structure, but that shouldn't be too difficult.

Suppose we have 5 forces that we want energies for, and we want 2 force groups for multiple time stepping:

for force in system.getForces():
    print force.getForceGroup()
0
1
2
3
4

The integrator would then have to get the forces setting the flags to getState to 2**0+2**1+2**2+2**3+2**4 or 2**5, depending on what forces are being evaluated that step. When you want components, evaluate each group by itself.

While I would definitely like to see the original proposal implemented, I think allowing multiple nonbonded groups would be an awesome first step (since even the energy component decomposition is unlikely to provide enough fine-grained control for everybody's use cases).

rmcgibbo commented 10 years ago

How is this supposed to related to the existing force group feature? My read of this thread is basically "It would be nice if the existing force group feature were easier to use from python, and didn't make me thing about bitmaps".

e.g, without implementing anything new:

def forcegroupify(system):
    forcegroups = {}
    for i in system.getNumForces():
        force = system.getForce(i)
        force.setForceGroup(i)
        groups[force] = i
    return forcegroups

def getEnergyDecomposition(context, forcegroups):
    energies = {}
    for f, i in forcegroups.items():
        energies[f] = context.getState(getEnergy=True, groups=2**i).getPotentialEnergy()
    return energies

swails commented 10 years ago

"It would be nice if the existing force group feature were easier to use from python, and didn't make me thing about bitmaps".

The main limitation on my end is that NonbondedForce clumps a large number of energy terms into one. The energy you get from nonbonded interactions encompasses electrostatic, van der Waals, Generalized Born energies (if you're using a GB potential) and any exceptions you may have added (so the scaled 1-4 nonbonded terms in the Amber FF or the different LJ types for 1-4 interactions in the CHARMM force field). The only way you can break up the NonbondedForce is by assigning the reciprocal space part a separate group, but that doesn't help at all if you want to extract LJ, GB, or exception-specific energies.

The only way to decompose those energies with the current API is to create a separate System (and Context??) for each component of the NonbondedForce that you want.

rmcgibbo commented 10 years ago

But fixing it requires a much more limited change that doesn't create a new API. All that needs to be added is a NonbondedForce.setLeonardJonesForceGroup. There's already a NonbondedForce.setReciprocalSpaceForceGroup, which is permits breaking up the energy contributions from a single Force into multiple force groups.

Lnaden commented 10 years ago

I believe Jason was saying you can effectively do this for any type of force assigned to NonbondedForce right now (vdW, electrostatic) by assigning two NonbondedForce objects to different force groups, and querying them as needed. However, this only works for NonbondedForce interactions and you are limited by the 32 force groups.

For reweighting and multistate free energy evaluations, we are interested in what the potential energy of a particular configuration is evaluated at other thermodynamic states by perturbing some parameters. In the current force group framework, we currently must copy the coordinates to a new context and do a full energy evaluation of solvent + solute. If we are looking to evaluate many new possible states, such as finding optimal parameters, this can be a very time consuming process. If instead we knew how the vdW energy and we were looking for adjusting epsilons, we could theoretically compute this terms contribution in other thermodynamic states without copying coordinates to new contexts and re-evaluate the whole energy. Alternately, you could assign a unique thermodynamic state to each force group, but you would run out of groups. This is just one example though, similar cases can be drawn for torsion or dihedral interactions.

For many of the NonbondedForce interactions, I have thought of how you could effectively do this in one context with interaction groups, force groups, and a number of calls to force.UpdateParametersInContext but I am not sure it would work for long- and short-range electrostatics or bonded interactions.

swails commented 10 years ago

All that needs to be added is a NonbondedForce.setLeonardJonesForceGroup.

This is only one decomposition that may be of interest. What about breaking off the extra exception contributions?

There's already a NonbondedForce.setReciprocalSpaceForceGroup, which is permits breaking up the energy contributions from a single Force into multiple force groups.

This is fundamentally very different. I suspect what's done right now is that all direct pair-pair interactions are condensed into a single loop/kernel for the sake of efficiency. The reciprocal space part is already an entirely separate part of the calculation, so assigning it to another group is painless.

If I'm right above, allowing you to break up nonbonded force evaluations into multiple kernels/groups would have serious performance implications and could easily require a large amount of work redoing the underlying implementation of the nonbonded kernels. I would not be surprised at all to hear @peastman say Lee Ping's approach is easier than the allowing multiple nonbonded force groups and that setLennardJonesForceGroup will simply never happen.

swails commented 10 years ago

I believe Jason was saying you can effectively do this for any type of force assigned to NonbondedForce right now (vdW, electrostatic) by assigning two NonbondedForce objects to different force groups, and querying them as needed.

Actually no. This is not currently allowed. I tried it :). You can have as many nonbonded force objects as you want (GBForce, NonbondedForce, CustomNonbondedForce, CustomGBForce, etc.), but they MUST all be assigned the same group, or OpenMM will throw a runtime exception. I found that out trying to separate the electrostatic from the GB energies using groups. The only way I could get the GB energy is to run 2 energies -- one with and one without the GBForce added to the System.

If you could do this, then I think most of our problems could be solved with a single System in a single Context (notwithstanding the 32-group-limit).

Ultimately, you can use the Force group trick for every single force except those computing nonbonded pairwise interactions. You can implement Urey-Bradley terms using HarmonicBondForce and assign that a different group from the standard Bond terms included in a different HarmonicBondForce instance, as an example. I've done this and it works correctly. Not so with nonbonded forces...

As a disclaimer, I haven't tried this with other nonbonded force-like forces such as those from the Amoeba plugin...

rmcgibbo commented 10 years ago

My concern with Lee-Ping's proposal is that it seems 85% redundant with an existing feature.

Lnaden commented 10 years ago

You can have as many nonbonded force objects as you want (GBForce, NonbondedForce, CustomNonbondedForce, CustomGBForce, etc.), but they MUST all be assigned the same group, or OpenMM will throw a runtime exception.

Really? You cannot assign different nonbonded forces to different force groups, but you can assign NonbondedForce.setReciprocalSpaceForceGroup which splits a single force object into multiple groups? It seems that both cases would thrown an error for the same reason.

My only concern with Lee-Ping's proposal is that it seems 85% redundant with an existing feature.

There are some elements of this which are redundant for specific interactions such as vdW and some electrostatics, but this is only with manipulating multiple force groups and extracting that information at run time is limited. I think this method would make getting all information of this type much easier.

swails commented 10 years ago

Really? You cannot assign different nonbonded forces to different force groups, but you can assign NonbondedForce.setReciprocalSpaceForceGroup which splits a single force object into multiple groups? It seems that both cases would thrown an error for the same reason.

Correct, you cannot assign two different nonbonded forces to different groups. I gave the only logical explanation I can think of in my post before that (below). I certainly don't expect you to take my word for it -- try assigning a GBForce to a different group than NonbondedForce (or a NonbondedForce with just charges and another with just vdW parameters).

If I'm right above, allowing you to break up nonbonded force evaluations into multiple kernels/groups would have serious performance implications and could easily require a large amount of work redoing the underlying implementation of the nonbonded kernels.

swails commented 10 years ago

My only concern with Lee-Ping's proposal is that it seems 85% redundant with an existing feature.

My main concern with @leeping's proposal is that it would probably not have enough flexibility for me to do what I want.

You can effect (mostly) the same thing using groups, but it seems (to me!) that you can't implement @leeping's proposal using groups nearly as efficiently. With @leeping's proposal you carry around an additional accumulator(s) for the additional energy term(s) you want, but there's no requirement to break apart the double loop kernel.

The difference I see is that you can use @leeping's proposal during typical MD simulations (and sample energies more frequently), whereas the overhead of multiple nonbonded force evals with different groups will severely crimp efficiency.

Of course this is all a moot point if it's not important enough relative to the time requirements for it to climb up @peastman's to-do list :).

rmcgibbo commented 10 years ago

Of course this is all a moot point if it's not important enough relative to the time requirements for it to climb up @peastman's to-do list :).

The surest way to get one's favorite feature merged is to roll up one's sleeves...

:)

jchodera commented 10 years ago

Apologies for being late to this thread, but I wanted to chime in on a few things.

I can't think of a situation where multiple Force objects contribute to a single energy component though - can you?

In order to implement alchemically modified Lennard-Jones forces, we use both CustomNonbondedForce (to handle interactions between atoms) and CustomBondedForce (to handle nonbonded exceptions). We actually use these together with NonbondedForce to keep the energy differences with the unmodified systems small.

On the more general topic, I agree with @rmcgibbo that the functionality for this already exists in the API. @rmcgibbo gives a nearly complete example in just a few lines of code. Breaking up with finer granularity (e.g. Lennard-Jones vs electrostatics) requires just a few more lines of code to zero out some parameters and push them to the Context via updateParametersInContext or create a new Context.

The efficiency of this approach could be poor, however, but this hasn't yet been discussed. @leeping and @swails: What are your performance requirements for these energy decompositions? Do you do need to do them every timestep, or only infrequently? My suspicion is that this is only done infrequently, so it is the coding overhead that is the issue.

An alternative that may have broader utility would be to implement:

more Pythonic approaches to retrieving specific Force objects added to the Python wrappers, to make coding schemes like this more convenient
a Python module added to the OpenMM app that facilitates this kind of energy decomposition, and has full knowledge of the existing OpenMM forces and commonly desired force decompositions

Would this suit the needs of everyone here?

The one place where I can see this still falling short is @swails attempts to separate out some specific PME terms to match up with AMBER, though this might be so implementation specific as to be very hard to say we could easily compare generally.

leeping commented 10 years ago

One thing I learned from this thread is that I wasn't creative enough with how to use force groups - Jason, thanks for the suggestion regarding combining force group energy components with multiple timestep integrators.

There are a number of ideas here regarding how to use force groups, multiple Force objects, and updating parameters in the context to get a more fine-grained energy component analysis. It's also possible to get the 1-4 specific interactions by having two copies of NonbondedForce with different exclusions. More generally, we can slice the energy almost any way by being clever enough with creating modified Forces and Systems. I could get the AMOEBA polarization energy by creating two AmoebaMultipoleForce objects and taking the difference, but I am basically doing two full calculations at that point. In another program, this might correspond to modifying the force field and rerunning the whole calculation.

I think the main decision is which path we should go down. Either we invest the effort to make energy components a more basic part of the code (a computationally efficient feature available to basic users), or we continue to use the existing API in more complex ways to get finer grained decompositions. I think John's suggestion is a nice compromise because it makes the feature easily usable without requiring a lower-level implementation, but I am still a proponent of the former route because of speed. (I would help to implement it, though that might be even more time consuming for Peter!)

Regarding speed and my current use case, currently a significant fraction of time is spent by ForceBalance in post-processing trajectories and re-evaluating energies (this is performed 2P times over an existing simulation trajectory, where P is the number of force field parameters, and the trajectory contains ~10000 samples at 1ps intervals). If I want to do an energy component analysis in the trajectory loop, then the suggested approaches of duplicating Force objects or using updateParametersInContext could increase the cost significantly. With the Amoeba interactions, the AmoebaMultipleForce is so expensive to evaluate that we shouldn't need to call it multiple times to get the energy components.

On the other hand, if we invest the initial time to implement "energy counters" on a lower level and create an API to get energy components, it would be easy to introduce new energy decompositions in the future and we would no longer have to unnecessarily multiply the computational cost. I do think it is worth the savings in user time and CPU time.

leeping commented 10 years ago

Regarding the possibility of multiple Force objects contributing to the same energy component: I acknowledge this is possible, but we may leave it to the user to lump components together as they see fit. A suitable naming convention for the energy components would make this easy (e.g. LJ-14, LJ-Normal, LJ-DispCorr).

jchodera commented 10 years ago

Regarding speed and my current use case, currently a significant fraction of time is spent by ForceBalance in post-processing trajectories and re-evaluating energies (this is performed 2P times over an existing simulation trajectory, where P is the number of force field parameters, and the trajectory contains ~10000 samples at 1ps intervals). If each trajectory loop turns into an energy component analysis, then the suggested approaches of duplicating Force objects or using updateParametersInContext could increase the cost significantly. With the Amoeba interactions, theAmoebaMultipleForce is so expensive to evaluate that we shouldn't need to call it multiple times to get the energy components.

@leeping: Even if the division into M components requires MP energy evaluations, what fraction of your total execution time of your entire program (including generation of the trajectories) does this correspond to, and how much if a speed up Sitka you get if this was, say, M times faster?

My guess is that, since you are looking at snapshots sampled only every 1000 steps (which is probably still too often, since they are likely still correlated) even making energy component decomposition free with total potential energy computation would not speed up execution time by a significant amount.

jchodera commented 10 years ago

Regarding the possibility of multiple Force objects contributing to the same energy component: I acknowledge this is possible, but we may leave it to the user to lump components together as they see fit. A suitable naming convention for the energy components would make this easy (e.g. LJ-14, LJ-Normal, LJ-DispCorr).

Is asking the user to lump many components really much simpler than asking then to split force groups into force components? I'm not sure.

This question isn't meant to be antagonistic---just that we should work out what the use cases would look like with the current and proposed APIs to ensure this would actually be more convenient for the user.

leeping commented 10 years ago

Even if the division into M components requires MP energy evaluations, what fraction of your total execution time of your entire program (including generation of the trajectories) does this correspond to, and how much if a speed up Sitka you get if this was, say, M times faster?

For the iAMOEBA parameterization, the derivatives (~40 trajectory loops, since we have ~20 parameters) took 50% of the time of the MD simulation. For a rigid water model, it is closer to 25% since we have fewer parameters (and overhead from looping over positions becomes important), but see below.

My guess is that, since you are looking at snapshots sampled only every 1000 steps (which is probably still too often, since they are likely still correlated) even making energy component decomposition free with total potential energy computation would not speed up execution time by a significant amount.

I already use autocorrelation times of the instantaneous observables to determine how to space the snapshots, and 2 ps is a quite reasonable sampling interval for, say, the box volume.

The calculation could easily get more expensive in the future where P could approach 100 for the amino acid side chain analogues (e.g. if we decide to fit the condensed phase properties for these molecules).

I could save some time if I prepared a mapping of parameter perturbations to energy components, as Michael and Himanshu have successfully done; for a given parameter perturbation, I would only need to recalculate the energy for the corresponding Force.

The expensive part would show up if we create multiple copies of Forces to do the finer grained decompositions. The dynamics calculation itself would slow down unless we create one Context for MD and another Context for postprocessing. That's essentially like running two calculations, and it would be safer / more reasonable if the energy components came from the same Context that was used to run the MD simulation.

Is asking the user to lump many components really much simpler than asking then to split force groups into force components? I'm not sure.

Sure - I think the question has a quantitative aspect. As in: if we provide a very fine grained decomposition and the user has to carefully reassemble the energy components, then it might not be any easier for the user to obtain what they want.

I personally wouldn't have too many situations where I need to combine many (i.e. > 2) pieces into a single energy component, but your perspective with free energies and more CustomForces might be different.
From a debugging perspective, it is helpful when the program doesn't lump energy components from different Forces together "under the hood".
Having a naming convention for the components should make lumping easy - or we can implement a higher level Python module to do the lumping following your earlier suggestion, and it would be just a few lines of code.

jchodera commented 10 years ago

For the iAMOEBA parameterization, the derivatives (~40 trajectory loops, since we have ~20 parameters) took 50% of the time of the MD simulation. For a rigid water model, it is closer to 25% since we have fewer parameters (and overhead from looping over positions becomes important), but see below.

I think I'm a bit confused now. This feature proposal is for energy term decomposition, but you're mentioning timings for computing finite-difference derivatives. How do these fit together? (Apologies for being dense here!)

Also, what about a proposal to compute energy derivatives with respect to parameters analytically? Wouldn't this speed things up by a factor of 2?

leeping commented 10 years ago

Hi John,

I think I'm the one who got confused, as I have two kinds of trajectory loops in ForceBalance - one related to getting thermodynamic property derivatives from MD simulations, and one related to fitting single-point properties from QM. One of my use cases for an energy decomposition would be to use it in the second kind of trajectory loop, because there are corresponding energy component analyses from QM that we may fit the parameters to. These calculations are less expensive than running MD, but they could still take several hours because of the large number of snapshots involved (>100k snapshots * 2P).

I think analytic derivatives of the energy with respect to parameters (hereafter "analytic derivatives") would be extremely useful for ForceBalance but also very intensive in terms of programmer time and difficulty (either me or Peter). I spent a few hundred hours doing this in grad school for the energy terms in GROMACS (I implemented analytic first and second derivatives of the energy and force, so going out to third order), and it was difficult to ensure the terms were correct due to the chain rule terms that connect the "user tunable parameter" (i.e. in the force field file) to the "internal parameter" (i.e. inside the inner loop). For the NonbondedForce, the program is broadcasting the force field parameters out to different atom pairs, applying combining rules, sometimes reordering the atoms - and to build the analytic derivative I had to reverse this mapping; this is nontrivial because the inner loop is many steps removed from the topology / parameter data structure. Many nights were spent checking the correctness of derivatives by finite difference. For the polarizable force fields, I don't think anyone has derived the expressions yet.

(In the end, the code became unmanageable and I couldn't merge it into the GROMACS master branch, so I no longer use any analytic derivatives in ForceBalance.)

That said, I think that analytic derivatives would speed things up by much more than a factor of 2 - the ForceBalance calculation scales with the number of tunable parameters in the force field (2P could exceed 100), whereas the analytic derivative roughly scales with the number of parameters (per particle) in each Force object - i.e. a constant factor of 2-10 depending on how many parameters are used to specify each particle.

In short, "analytic derivatives of the energy with respect to parameters" is more of a wish list item that I never asked for. :D

jchodera commented 10 years ago

One of my use cases for an energy decomposition would be to use it in the second kind of trajectory loop, because there are corresponding energy component analyses from QM that we may fit the parameters to. These calculations are less expensive than running MD, but they could still take several hours because of the large number of snapshots involved (>100k snapshots * 2P).

Ah, OK! This is the case where you want to be able to compute specific energy components in a very specific way to be able to compare to SAPT QM calculations, yes? Let's try to refine this use case then, if this is really the situation where your proposed feature would be most useful.

You say the calculations could take "hours", but what fraction of total program runtime would these calculations account for, and by what fraction would total runtime be decreased if only one potential energy calculation could also give you your energy components?
There is no need to have the System objects you use for reprocessing contain all Force objects, which makes them faster to evaluate! For example, you could have a System object that only contains NonbondedForce with only the Lennard-Jones terms, and a Context created with this system will be quite fast because it is smart enough to know it should only compute Lennard-Jones terms. I think this would certainly be much faster than M*P times the full potential energy calculation.

jchodera commented 10 years ago

I think computing analytic energy derivatives would be extremely useful for ForceBalance but also extremely intensive in terms of programmer time (either me or Peter). I spent a few hundred hours doing this in grad school for the energy terms in GROMACS, and it was difficult to ensure the terms were correct due to the chain rule terms that connect the "user tunable parameter" (i.e. in the force field file) to the "internal parameter" (i.e. inside the inner loop). For the NonbondedForce, the program is broadcasting the force field parameters out to different atom pairs, applying combining rules, sometimes reordering the atoms - and to build the parametric derivative I had to reverse this mapping. For the polarizable force fields, I don't think anyone has derived the expressions yet.

That said, I think that implementing analytic energy derivatives would speed things up by more than a factor of 2 - the ForceBalance calculation scales with the number of tunable parameters in the force field (up to hundreds), whereas the analytic derivative scales with the number of parameters (per particle) in each Force object - i.e. a factor of 2-10 depending on how many parameters are used to specify each particle.

Would you be up for opening a separate issue to discuss the potential for implementing analytical parameter derivatives? I think that the Custom*Force technology (especially Lepton) might actually make it possible to implement derivatives in a way that eliminates exactly the headaches you describe while still being both fast and flexible.

I think analytical derivatives would also be useful for some fancy simulation techniques, like orthogonal space random walk (OSRW) techniques.

mrshirts commented 10 years ago

I think analytical derivatives would also be useful for some fancy simulation techniques, like orthogonal space random walk (OSRW) techniques.

Just piping in to say that if you have an energy function of the form H(r,lambda) = H_0(r) + \sum_i h(lambda_i)*H_i(r) (which can be done even with interaction site introduction/removal) then computing the analytical derivatives can be done trivially if the energy components H_i(r) are available.

That's not to say that it's the best way to do it, but it does show how energy group decomposition actually does help analytical derivatives wrt coupling parameters (or parameters themselves).

On Thu, Apr 3, 2014 at 11:59 AM, John Chodera notifications@github.comwrote:

I think computing analytic energy derivatives would be extremely useful for ForceBalance but also extremely intensive in terms of programmer time (either me or Peter). I spent a few hundred hours doing this in grad school for the energy terms in GROMACS, and it was difficult to ensure the terms were correct due to the chain rule terms that connect the "user tunable parameter" (i.e. in the force field file) to the "internal parameter" (i.e. inside the inner loop). For the NonbondedForce, the program is broadcasting the force field parameters out to different atom pairs, applying combining rules, sometimes reordering the atoms - and to build the parametric derivative I had to reverse this mapping. For the polarizable force fields, I don't think anyone has derived the expressions yet.

That said, I think that implementing analytic energy derivatives would speed things up by more than a factor of 2 - the ForceBalance calculation scales with the number of tunable parameters in the force field (up to hundreds), whereas the analytic derivative scales with the number of parameters (per particle) in each Force object - i.e. a factor of 2-10 depending on how many parameters are used to specify each particle.

Would you be up for opening a separate issue to discuss the potential for implementing analytical parameter derivatives? I think that the Custom*Force technology (especially Lepton) might actually make it possible to implement derivatives in a way that eliminates exactly the headaches you describe while still being both fast and flexible.

I think analytical derivatives would also be useful for some fancy simulation techniques, like orthogonal space random walk (OSRW) techniques.

Reply to this email directly or view it on GitHubhttps://github.com/SimTk/openmm/issues/387#issuecomment-39469380 .

leeping commented 10 years ago

Hi John,

The issue with creating multiple CUDA Contexts is that the GPU only has space for a finite number of them (and I am already creating around 20 of them for different systems for matching to QM). If we create five times this amount, it will probably hit some upper limit. However, I think we can achieve the same effect by calling Context.getState(groups) to activate just one Force.

I think that matching MM energy components to QM energy components (i.e. SAPT / ALMO-EDA) would benefit from a lower-level implementation of energy components; regarding the question of overall wall time in a ForceBalance calculation, it is kind of a gray area. At the start of the calculation where the MD trajectories are very short (<1 ns), the QM targets are going to be the bottleneck. For the iAMOEBA parameterization, a 1 ns MD trajectory with property derivatives takes 1-2 hours, and the QM targets (and their derivatives) takes about 8 hours. Since they are executed asynchronously (the MD trajectory is farmed out to a cluster), a nonlinear iteration takes 8 hours. If we duplicated the Force objects to decompose electrostatics / polarization, the QM targets would then cost twice as much.

The other use case (I don't want to forget) is obtaining a finer grained energy component analysis from a single MD simulation, since it's useful for general analysis, validation, troubleshooting and debugging. The existing way would be to create several non-overlapping versions of multicomponent forces (e.g. NonbondedForce), or to run the MD simulation and postprocess using two different Contexts, but this doesn't guarantee that the energy components are correct (i.e. coming from the same Context that was used to run the simulation).

I'd like to talk about analytic energy derivatives in another topic. There may be challenges when Forces are not implementable as a Custom Force (e.g. PME electrostatics or polarizable models).

jchodera commented 10 years ago

The issue with creating multiple CUDA Contexts is that the GPU only has space for a finite number of them (and I am already creating around 20 of them for different systems for matching to QM).

Apologies for being unclear! I meant to suggest that different Context objects could be created and used sequentially. That is, something like

create context / compute only harmonic bond terms for all snapshots / destroy context
create context / compute only angle terms for all snapshots / destroy context
etc

You incur the overhead of having to push coordinates M times, but the whole scheme should still be much faster than M*P total potential energy evaluations.

jchodera commented 10 years ago

However, I think we can achieve the same effect by calling Context.getState(groups) to activate just one Force.

That's a good point! I had missed this.

jchodera commented 10 years ago

For the iAMOEBA parameterization, a 1 ns MD trajectory with property derivatives takes 1-2 hours, and the QM targets (and their derivatives) takes about 8 hours. Since they are executed asynchronously (the MD trajectory is farmed out to a cluster), a nonlinear iteration takes 8 hours.

Is the QM target portion parallelized?

jchodera commented 10 years ago

I'd like to talk about analytic energy derivatives in another topic.

Please start this thread! I think it could be useful to discuss.

There may be challenges when Forces are not implementable as a Custom Force (e.g. PME electrostatics or polarizable models).

We can lobby for a CustomPMEForce, etc., if there is clear utility there.

leeping commented 10 years ago

Sure, I'll start an energy derivatives thread when I gather my thoughts (plus I should write some code in ForceBalance that makes better use of the force groups, following what I learned in this thread). At least I'll be able to report on the performance impact of the current energy component analysis (based on force groups / duplicate forces).

Meanwhile, I don't want to overlook the other use case of generating energy components from the original MD simulation, since it's generally useful for analysis, validation, troubleshooting and debugging. This is present in GROMACS, TINKER, CHARMM, and AMBER so at least the developers of other codes consider it to be a fundamental "output" of a MD simulation.

As mentioned before, the existing way would be to create several non-overlapping versions of multi-component forces (e.g. NonbondedForce), or to run the MD simulation and postprocess using two different Contexts, but this doesn't guarantee that the energy components are correct because they are coming from a different Context than was used to run the simulation.

rmcgibbo commented 10 years ago

Here's my interpretation of the thread so far.

1) adding finer resolution for force groups, so that different components of one Force can be separated into different groups, is a desired feature. In particular, this means separating LJ/Coulomb. The other examples in LP's original post (bonds, dihedrals, etc) can already be separated out. (2) Force groups are not particularly convenient to use from python. (3) There are a couple technical limitations with force groups for nonbonded interactions, (e.g. having a single cutoff) due to limitations that come from coalescing all of the kernels together at JIT-time. (4) Having more than 32 available force groups would help some use cases.

There are a lot of assumptions being made about performance that I'm not convinced by. In particular, I feel like people are counting performance in their head by tallying up the number of calls they make to the C++ API. It's like "out of sight, out of mind". So, for instance Context.getState(getEnergyComponents=True) is supposed to take "one unit" of time, relative to "five units" for [context.getState(getEnergy=True, groups=2**system.getForce(i).getForceGroup()) for i in range system.getNumForces()], assuming that you have 5 forces. But obviously this isn't accurate at all. The only reasonably implementation of getEnergyComponents would basically do exactly that loop under the hood.

From an implementation perspective, the key aspect of the design of Context/ContextImpl is that there's only one force/energy accumulator. ContextImpl makes a loop over the Forces, and they either get skipped or included based on the setting of groups. IMO this is a good thing . It makes the implementation much more clear, less complex, and faster for the general case (running dynamics), even if it slows down some specialized applications (force balance).

leeping commented 10 years ago

Hi Robert,

You're right that the performance of [context.getState(getEnergy=True, groups=2**system.getForce(i).getForceGroup()) for i in range system.getNumForces()] is much lower than several independent energy evaluations [context.getState(getEnergy=True) for i in range system.getNumForces()] - but I think we already agreed on that.

Rather, the performance impact comes from creating multiple nonoverlapping copies of the same Force - if I understand correctly, the NonbondedForce with the charges zeroed out takes just as long (this is definitely true for AMOEBA forces, because some special code was added to skip quadrupole interactions when the parameters are zeroed out). Thus, if we create multiple copies of NonbondedForce and zero out different parameters in each one, then the computational cost really is multiplied.

Lee-Ping

rmcgibbo commented 10 years ago

...creating multiple nonoverlapping copies of the same Force ...

Yes, that's basically why I put point (1) at the top of my list.

jchodera commented 10 years ago

I agree with @rmcgibbo's assessment.

To address @leeping's other important point:

Meanwhile, I don't want to overlook the other use case of generating energy components from the original MD simulation, since it's generally useful for analysis, validation, troubleshooting and debugging. This is present in GROMACS, TINKER, CHARMM, and AMBER so at least the developers of other codes consider it to be a fundamental "output" of a MD simulation.

This is a few distinct use cases.

For "validation, troubleshooting, and debugging", it is useful to compare to other codes energy component by energy component. I think it would be great to have some sort of tool to do this. Some of these codes break up energy components in different ways, so it may be necessary to have the implementation of this be aware of individual codes to compare to. My guess is that this can be done in Python, without C++ API changes, because there are not strict performance requirements here.
For "analysis", I am always, always skeptical of any scheme that attempts to decompose energies by component. Statistical mechanics does not work that way. In any case, I think the exact "analysis" use cases and performance requirements would need to be significantly elaborated upon before one could decide on what API changes would be optimal, if at all.

mrshirts commented 10 years ago

For "analysis", I am always, always skeptical of any scheme that attempts to decompose energies by component. Statistical mechanics does not work that way. In any case, I think the exact "analysis" use cases and performance requirements would need to be significantly elaborated upon before one could decide on what API changes would be optimal, if at all.

Statistical mechanics does not work on components in the sense that free energy(component_i) + free energy(component_j) =/= free energy(component_i

component_j) even if energy(component_i) + free energy(component_j) = energy(component_i + component_j). That is not to say that the decomposition of energy terms can't be used for various other purposes that use correct statistical mechanics, e.g. if one wants to know the free energy of perturbing one atom, then only the components related to that atom need to be recalculated, not every single interaction in the system.

On Thu, Apr 3, 2014 at 1:47 PM, John Chodera notifications@github.comwrote:

I agree with @rmcgibbo https://github.com/rmcgibbo's assessment.

To address @leeping https://github.com/leeping's other important point:

Meanwhile, I don't want to overlook the other use case of generating energy components from the original MD simulation, since it's generally useful for analysis, validation, troubleshooting and debugging. This is present in GROMACS, TINKER, CHARMM, and AMBER so at least the developers of other codes consider it to be a fundamental "output" of a MD simulation.

This is a few distinct use cases.

For "validation, troubleshooting, and debugging", it is useful to compare to other codes energy component by energy component. I think it would be great to have some sort of tool to do this. Some of these codes break up energy components in different ways, so it may be necessary to have the implementation of this be aware of this. My guess is that this can be done in Python, without C++ API changes, because there are not strict performance requirements here.

For "analysis", I am always, always skeptical of any scheme that attempts to decompose energies by component. Statistical mechanics does not work that way. In any case, I think the exact "analysis" use cases and performance requirements would need to be significantly elaborated upon before one could decide on what API changes would be optimal, if at all.

Reply to this email directly or view it on GitHubhttps://github.com/SimTk/openmm/issues/387#issuecomment-39482191 .

swails commented 10 years ago

(3) There are a couple technical limitations with force groups for nonbonded interactions, (e.g. having a single cutoff) due to limitations that come from coalescing all of the kernels together at JIT-time.

I've been harping on this point a bit. However, since only the CUDA and OpenCL programs actually JIT compile the coalesced nonbonded forces, this restriction is platform-specific and only present on the OpenCL and CUDA platforms. I could split nonbonded force groups using the Reference and CPU platforms with no problems.

This allows some of the fine-grained decomposition that I've been looking for, but many of the CustomNonbondedForce classes (and their relatives, like CustomGBForce) are painfully slow even for single-point calculations of large molecules.

Thanks to @Lnaden for bringing this to my attention.

swails commented 10 years ago

Statistical mechanics does not work on components

Agreed. The components are invaluable for debugging implementations of various force fields

Also, if you are making a perturbation that only affects a single component you can get the full energy difference by recomputing only the affected component. This is frequently the nonbonded force in my experience, though, so recomputing the entire energy is often not much more expensive than just the 'affected' parts (at least for fixed-charge, non-Amoeba force fields).

rmcgibbo commented 10 years ago

@swails: I think I overheard yesterday that removing the limitation on different nonbonded cutoffs was in the works for the next release. No guarantees though.

swails commented 10 years ago

I think I overheard yesterday

Ooh eavesdropping. Sounds great. Thanks for the info

jchodera commented 10 years ago

Also, if you are making a perturbation that only affects a single component you can get the full energy difference by recomputing only the affected component.

Absolutely correct. This is an important qualification.

This is also an important use case, but it can already be addressed by postprocessing using a System object containing only a single Force object, something that we have done as well.

peastman commented 10 years ago

I suspect what's done right now is that all direct pair-pair interactions are condensed into a single loop/kernel for the sake of efficiency.

Exactly. This is why it requires all of them (the direct space part of NonbondedForce, CustomNonbondedForce, GBSAOBCForce, etc.) to be in the same kernel. There are a few ways we could remove this requirement, but they all have costs:

Split them into separate kernels. This would significantly hurt performance. There's a lot of work that's the same in all of them (loading the atom positions, applying periodic boundary conditions, computing displacements between atoms, summing and storing force), so that work would get done multiple times.
Keep them in a single kernel, but put each one into a separate "if" block based on its force group. This would also hurt performance: branches are more expensive on the GPU than the CPU, and we'd be adding a few of them to the inner loop.
Have it always evaluate all forces, but then write them to different buffers. This would have several costs. We'd need extra registers and extra shared memory, which would hurt the performance of the nonbonded kernel. Then we'd have more buffers to clear at the start of each time step, and we'd need to sum the different buffers before doing integration.
Compile a separate version of the nonbonded kernel for each possible combination of forces. This is probably the best option, since it's the only one that wouldn't affect the speed of computing forces. It would increase startup time, though, and make the code quite a bit more complicated.

I think I overheard yesterday that removing the limitation on different nonbonded cutoffs was in the works for the next release.

You certainly didn't overhear that from me, and if someone else were working on it, I'd hope I'd know!

but many of the CustomNonbondedForce classes (and their relatives, like CustomGBForce) are painfully slow even for single-point calculations of large molecules.

I recently optimized the CPU implementation of CustomNonbondedForce, so it's better than it used to be. CustomGBForce is still very slow though.

I think that the Custom*Force technology (especially Lepton) might actually make it possible to implement derivatives in a way that eliminates exactly the headaches you describe while still being both fast and flexible.

Cringe. Shudder. Ok, I can do that in CustomBondForce. But don't even think of asking for it in CustomGBForce! You have no idea how hard it was getting all the different chain rule terms right just to compute the force...

leeping commented 10 years ago

Hi Peter,

I think it might be worth having a discussion for the use cases of energy derivatives first. ForceBalance would obviously benefit from having these, but the most useful cases are also the most difficult cases (e.g. a nonbonded interaction with PME).

Regarding energy components: I think this thread gave me enough new suggestions using force groups that I can test them for my applications. I also like Robert's alternative to energy components where we may set separate force groups for energy components within a single Force, e.g. NonbondedForce.setLennardJonesForceGroup or AmoebaMultipoleForce.setPolarizationForceGroup. There will not be many of these methods created, because most of the forces contribute to just one energy component.

I'm curious what other folks think - would extra force groups cover most of the needed use cases?

peastman commented 10 years ago

What would AmoebaMultipoleForce.setPolarizationForceGroup do? The polarization isn't really separable.

jchodera commented 10 years ago

I think @leeping is thinking of Tinker, where the polarizable contribution is accumulated separately.

This contribution can of course be obtained in OpenMM by postprocessing with and without the polarizability set to zero, and computing the difference in energy.

leeping commented 10 years ago

Right :) I was indeed thinking of Tinker.

openmm / openmm

Energy component analysis #387