openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
318 stars 92 forks source link

Container class for parameterized systems #310

Closed jchodera closed 2 years ago

jchodera commented 5 years ago

I've been discussing (with @andrrizzi @SimonBoothroyd @j-wags @proteneer) the potential for solving a number of technical debt issues we have accumulated with a new Python container class to represent a parameterized system.

To recap, we have a few major accumulated technical debt issues:

Considering all of these limitations, I propose we create a new object model for parameterized systems that aims to solve as many of these problems as is conceivable via a good, extensible design, and amortize the cost of its implementation over various subprojects that need different capabilities.

Our design should:

I find it helpful to kick around ideas of what using the class would look like. Here's some very rough ideas for inspiration.

Suppose first that it would be useful to create something like a ParameterSet object that would replace the current awkward ForceField syntax of

forcefield.get_handler('vdW').parameters['[#1:1]-[#7]']

with something like the more natural

forcefield['vdW']['[#1:1]-[#7]']

This would enable manipulations like

# Create an OFF force field object
from openforcefield.typing.engines.smirnoff import ForceField
forcefield = ForceField('openforcefield-1.0.offxml')
# Create an OFF parameterized system
system = forcefield.create_system(topology)
# Access the topology associated with that system
system.topology
# Access the forcefield parameter set associated with that system
system.forcefield['vdW']['[#1:1]-[#7]'].sigma
# Inspect the vdW sigma assigned to an particle 0
system.parameters['vdW'][0].sigma
# But we can still access the root parameter that controls all atoms of that type
system.parameters['vdW'][0].parameter.sigma
# and possibly change it
import pint
ureg = pint.UnitRegistry()
system.parameters['vdW'][0].parameter.sigma = 1.0 * ureg.angstrom
# NOTE: We have to figure out whether we should be allowed to change the parameters for individual particle or interactions in a way that would depart from the underlying parameter set, or just force that to change _all_ such instances in the set.
# Compute the potential energy for a given configuration
potential = system.potential_energy(positions)
# Create a method to compute the potential energy of just the vdW terms, using jax to achieve high performance, allowing parameter gradients
potential_energy_method = system.potential_factory(forces=['vdW'], parameter_gradients=['vdW'], method='jax')

None of this is intended to be a real suggestion for a final API, but just to give the flavor of what we could potentially implement in a manner that would let us easily solve many of the problems above.

jchodera commented 5 years ago

To clarify, after some discussion with @proteneer:

The object models described here are primarily to make it easy to

Fundamentally, parameterized systems are just multidimensional numerical arrays and indices of assigned parameters, and any energy computations or derivatives will ultimately need to work with multidimensional numerical arrays (e.g. numpy or tensorflow arrays). These object models provide a convenient interface for humans to these arrays, but ultimately, to make it easy to build numpy methods that can be JITed, there would need to be some way to efficiently render them to unless floating-point arrays.

But the arrays are useful: The JIT will use these arrays, the optimizer will use these arrays, and any Bayesian samplers will use these arrays, so it's useful to think about how to easily make the road between object model and arrays bidirectional and convenient.

proteneer commented 5 years ago

I think there's essentially two things here:

We can decide on the API later once we can agree on what the goals are here.

jchodera commented 5 years ago

It sounds like we're on the same page here.

The need to keep track of consistent units. Move simtk.unit -> pint.

We would likely want the API to present one data view where parameters are individually accessed using unit-bearing quantities--intended for programmatic access to individual parameters for constructing systems or having humans manipulate parameters---and another data view where parameters are exposed as flat unitless arrays for optimization or energy computation.

We can decide if we want the System builder to only accept numpy arrays with units.

Can you clarify what you mean by "System builder" in this context?

OpenMM's system class is unwieldy since it's written in C++. We also need to keep track of unique parameter origins (eg. CH4 has 4 sets of redunance C-H HarmonicBond parameters). We care about generating derivatives w.r.t. unique parameters.

We'd definitely want to keep track of parameter origins, as well as the current values of applied parameters.

j-wags commented 5 years ago

Overall, I think this is a really good idea. Some scattered thoughts, in no particular order:

proteneer commented 5 years ago

It sounds like we're on the same page here.

The need to keep track of consistent units. Move simtk.unit -> pint.

We would likely want the API to present one data view where parameters are individually accessed using unit-bearing quantities--intended for programmatic access to individual parameters for constructing systems or having humans manipulate parameters---and another data view where parameters are exposed as flat unitless arrays for optimization or energy computation.

We can decide if we want the System builder to only accept numpy arrays with units.

Can you clarify what you mean by "System builder" in this context?

Formally, a mapping that takes in an OFFXML, and a standard representation for a graph + 3D coordinates (eg. SDF/Mol2) and returns a parameterized system. This was a question of when units should come into play. For example, some functional forms are unit agnostic already (eg. HarmonicBond, HarmonicAngles, etc.). However, by convention, electrostatics must always include a ONE_4PI_EPS that has been hardcoded to be 138.935456 in standard MD units. Integrators/Thermostats also have them (BOLTZ/time step, etc.). If you want to fully consistent, better start adding units in all the potentials we use.

OpenMM's system class is unwieldy since it's written in C++. We also need to keep track of unique parameter origins (eg. CH4 has 4 sets of redunance C-H HarmonicBond parameters). We care about generating derivatives w.r.t. unique parameters.

We'd definitely want to keep track of parameter origins, as well as the current values of applied parameters.

+1. We need to decide what counts as trainable parameters though, take GBSA for example, there's the standard per atom charges, radii, scaled factors, but also the scalar parameters (alpha/beta/gamma OBCs).

andrrizzi commented 5 years ago

Apologies for the wall of text.

What do you think about a design similar to this? In particular, would this cover all the use cases we have in mind or is there too much/too little here?

Access and modify parameters

For now, I wouldn't implement the ability to modify the parameter of a single-particle independently to the other interactions assigned to the same parameter. The ForceField doesn't really support "exceptions" right now, and I'm worried about keeping it out of sync. Also, it sounds like this could be a use case of less importance.

One thing I noticed, is that the name of forces and parameter types might be different as there's not always a 1-to-1 correspondence (e.g., Electrostatics, LibraryCharges, and/or ChargeIncrement all enter a single force in the end). This feels a little weird, but I haven't thought of a clean solution so far.

# Access parameters for all particles by its ID.
ff_vdw_parameters = system.forcefield.parameters['VdW']
ff_vdw_parameters['[#1:1]-[#8]'].epsilon
ff_vdw_parameters['n12'].sigma
ff_vdw_parameters[2].sigma

# Or access by a tuple ID.
system.forcefield.parameters[('VdW', '[#1:1]-[#8]', 'sigma')]

# Modify parameter for all particles.
import pint
ureg = pint.UnitRegistry()
ff_vdw_parameters['n10'].sigma = 1.0 * ureg.angstrom

# Check the atoms, ID, and spring constant assigned to the 13th bond.
bond_force = system.forces['HarmonicBondForce']
bond_force[12].atoms
bond_force[12].parameter.id
bond_force[12].parameter.k

# Access charge information of single particle.
force_cls = system.forcefield.parameters['Electrostatics'].get_force_cls()  # E.g. PMEForce
# LibraryChargeType
system.forces[force_cls][2].parameter.charge  # Modifying this changes system.forcefield.
# ChargeIncrementType
system.forces[force_cls][120].parameter.charge  # Modifying this changes the charge_increment in system.forcefield.
system.forces['PMEForce'][120].parameter.charge_qm  # Read-only
system.forces['PMEForce'][120].parameter.charge_increment

# Check all the force field terms involving a specific particle.
for parameter_type, parameters in system.particles[3].parameters:
    print(parameter_type, parameters)  # For example "'VdW' [VdWParameter(sigma, epsilon), ...]"

Compute energy and and gradients

I like the idea of a potential energy function factory design, but I'm worried it's not going to cover all the way we'll want to extend it and use. For example, I believe both JAX and Dask require/are faster with their own implementation of a numpy array. Also, it will probably force us to work downstream with a pure numpy representation of a System, which removes some of the advantages of this new class. So I'd focus on making it very easy to implement backends to compute energies/gradients (which essentially will be these factories), while keeping in mind that we may want to implement the factory design alongside it if we want.

I think including positions and box vectors in System would make it easier evaluating potentials and exporting to other kinds of files, but I'm happy to hear your opinions about it.

# Compute potential energy of the system.
system.positions, system.box_vectors = positions, box_vectors
system.potential_energy

# Or for custom positions and a different backend (using a KernelBackendRegistry).
system.potential_energy(positions, box_vectors, kernel_backend='jax')

# More backend customization to control how each force is computed and distributed.
backend = MixedKernelBackend(
    force_backends={
        LJForce: JAXKernelBackend(),
        'PMEForce': OpenMMKernelBackend(),
        'HarmonicBondForce': DaskKernelBackend()
    },
    default=NumpyKernelBackend(),  # Any other force.
)
system.potential_energy(kernel_backend=backend)

In many instances, I had to work with a subset of a System (e.g. Monte Carlo, analyze positions and recompute energies for a few atoms and forces) so I'm thinking that a smart and robust SystemView that evaluates and create copies of the internal data lazily could be very helpful.

# Create a slice of a system as a SystemView and compute the potential of the subsystem.
# This creates a "detached" view, meaning that a change in one object doesn't affect the other.
system_view = system.slice(force=[LJForce, 'PMEForce'], particle=range(20))
system_view.potential_energy(kernel_backend=backend)

# Or pass it directly to the potential energy system as a shortcut for the same syntax?
system.potential_energy(**slice_kwargs)

# Create a slice of a system involving particles assigned to specific parameters.
# This creates a traditional view that allows manipulating the bigger system with the smaller one.
system_view = system.slice_view(parameters={
    ('VdW', 'n12'),
    ('Bonds', '[#6X4:1]-[#6X4:2]'),
})
# It can be detached later.
system_view.detach_view()

I'm not 100% sure on the following syntax to compute gradients. It's very intuitive to me, and I think in the end would be easy to implement with a JAX backend, but if there're catches I'm overlooking we can switch to something like system.get_parameter_gradient/hessian().

# Compute the gradient with respect to all the parameters (in the view).
# Essentially potential_energy_function() returns a callable Potential
# object that can be called and expose the gradient() method.
gradient_func = system_view.potential_energy_function().gradient(parameters=True)

# If the system encapsulate positions, and box_vectors, we can just go with.
gradient_vector = gradient_func()
gradient_vector = gradient_func(parameters=system_view.parameters)
# Or more generally
gradient_vector = gradient_func(positions[selection_of_particle_indices], box_vectors,
                                system_view.parameters)

# Access the gradient by index, parameter, or parameter ID.
gradient_vector[15]
gradient_vector[('VdW', 'n13', 'sigma')]

# Update forcefield.
system.forcefield[gradient_vector.parameters_ids] = system_view.parameters - 0.02*gradient_vector

# Compute forces.
gradient_vector = system_view.potential_energy_function().gradient(positions=True)()

# Or compute the Hessian for a subset of parameters.
hessian_func = system_view.potential_energy_function().gradient(parameters=True).gradient(parameters=True)
proteneer commented 5 years ago

@andrrizzi's comment that I wouldn't implement the ability to modify the parameter of a single-particle independently to the other interactions assigned to the same parameter is basically the crux of the problem. For some context, no MD engine out there has a way to keep track of the origin of the forcefield parameters, and designing good classes around this problem is basically uncharted territory at this point.

After having prototyped in OpenMM/timemachine for a good 6 months on just this exact problem, I've reached the conclusion that the system construction semantics needs to completely be separated from the potential energy implementation semantics. There are several practical considerations on why this needs to be done carefully:

  1. Forward-mode autodifferentiation scales O(P*S) where P is the number of parameters in your system and S is the complexity of a single forward pass O(N log N) for most MD packages. For reference, a standard forcefield has about 1000 trainable parameters for a single system, and about 15000+ duplicated parameters for an MD system. You cannot design classes in the absence of this consideration.
  2. Reverse mode has different complexity considerations and other types of overhead that I won't get into there.
  3. It's very important to carefully think about what is and isn't a trainable parameter. For example, phases and force constants are considered trainable in a PeriodicTorsion, but the period needs be be integers. Yet often times those three are lumped together so we need to avoid computing their derivatives.
  4. In terms of actual derivatives, there's 4.5 things that we'll ever need:
    • dE/dX (forces) we all need this
    • dE/dp @jchodera and @maxentile needs this
    • d2E/dx^2 "at someone every forcefield developers need the hessian"
    • d2E/dxdp mixed partials - I need this
    • dE/dbox - various barostats would love to have this
    • with the above five you can implement almost everything you need.

With the timemachine and all of its forcefield fitting routines, I've settled on the following function signature (for both JAX and custom-ops backends):

def potential(batched_conf, params, box):
    ...
    return energy

Concrete function implementations meet the required signature using either functors or functools.partial, copy and pasted from the timemachine code:

def harmonic_bond(conf, params, box, bond_idxs, param_idxs):
    ci = conf[bond_idxs[:, 0]]
    cj = conf[bond_idxs[:, 1]]
    dij = distance(ci, cj, box)
    kbs = params[param_idxs[:, 0]]
    r0s = params[param_idxs[:, 1]]
    energy = np.sum(kbs/2 * np.power(dij - r0s, 2.0))
    return energy

The gradients are implemented natively using jax

# bind the forcefield parameters via functools partial
harmonic_bond = functools.partial(bond_idxs=bond_idxs, param_idxs=param_idxs)
dE_dx = jax.grad(harmonic_bond, argnums=(0,))

The gradients of the system:

def total_potential(conf, params):
    total_e = harmonic_bond(conf, params) + harmonic_angle(conf, params) # they share a single params
    return total_e

total_dE_dx = jax.grad(total_potential, argnums=(0,))
total_dE_dp = jax.grad(total_potential, argnums=(1,))

And construction of higher order terms follow naturally. But the key here is that we must treat parameters as an opaque black box that we can then differentiate with respect to. It's currently implemented an opaque 1-dimensional vector of P total parameters that get indexed into via param_idxs of each potential energy class.

jchodera commented 5 years ago

Some quick comments for now---more later:

the name of forces and parameter types might be different as there's not always a 1-to-1 correspondence (e.g., Electrostatics, LibraryCharges, and/or ChargeIncrement all enter a single force in the end).

This may be something we want to fix in a future update. @j-wags : Thoughts?

I think including positions and box vectors in System would make it easier evaluating potentials and exporting to other kinds of files, but I'm happy to hear your opinions about it.

The original concept of System in OpenMM was to separate dynamical variables (positions, box vectors) from the description of how to compute the energies given the dynamical variables. I think it might still be useful to keep this separation, and instead have a single object represent the dynamical information (positions and box vectors).

I like the idea of a potential energy function factory design, but I'm worried it's not going to cover all the way we'll want to extend it and use. For example, I believe both JAX and Dask require/are faster with their own implementation of a numpy array. Also, it will probably force us to work downstream with a pure numpy representation of a System, which removes some of the advantages of this new class. So I'd focus on making it very easy to implement backends to compute energies/gradients (which essentially will be these factories), while keeping in mind that we may want to implement the factory design alongside it if we want.

Any concerns about which numpy implementation should be used could be easily addressed by one of the following mechanisms:

Mainly, we would want the implementation of potentials to keep in step with the parameter handlers, which would require we update the code in two places if we were to add a completely new parameter type. We might think about a design where the parameter handler would also include its own energy implementations (and, in future, possibly also its own information on how to translate this into OpenMM and other packages) allowing us to implement new potential terms in a totally modular fashion.

I also agree with pretty much everything @proteneer notes above, and want to emphasize that a primary goal here is to allow us to ultimately construction functions that expose 1D vector of unique mutable continuous parameters that we can use to compute energies and parameter gradients of the energy for the convenience of optimizers and samplers. We do also want to make it useful for manual inspection and interpretation, however. Because the same information is required for both purposes---just different views of it---I think it does make sense to either bundle them together into this class or to use objects of this class to construct the functions that map 1D parameter vectors to energies.

andrrizzi commented 5 years ago

Thank you both. I agree with essentially everything, and I made a few modifications to the last example in my post above that should make things less confusing.

To clarify some of the choices, here are the two extra constraints for the design that I'm trying to satisfy.

  1. I'd like the property calculator to use the same exact API for differentiation, irrespective of the backend used to generate the potential function.
  2. After the gradient is computed, I want to make it easy to interpret the results and write the update into the ForceField class.

For 1), what I'm trying to do above is essentially to return a callable Potential object instead of a static function that exposes a common API to be differentiated. Essentially, the sole purpose of the Backend classes would be the construction of this object.

For 2), working with pure numpy vectors is hard. You need to know somehow the exact order of the parameters in the 1D vector to inspect its gradient and update the ForceField accordingly. So I'm thinking of having our Potential.gradient()/fwdgrad()/revgrad() functions return a numpy array wrapper that makes it possible to address the array elements by parameter labels.

@proteneer

is basically the crux of the problem

Do you mean that you think this is fine (i.e. not implementing it) or that this is an interesting use case and we should find a way to support exceptions that are not assigned to a SMIRKS pattern?

the system construction semantics needs to completely be separated from the potential energy implementation semantics

Absolutely agree. I was thinking the latter would be entirely encapsulated in the Backend classes. I actually went through the (awesome) timemachine code and examples in the readme, and tried to imagine how a Backend/Potential object could encapsulate the lines of code you wrote above.

@jchodera

This may be something we want to fix in a future update. @j-wags : Thoughts?

First thing that comes to mind is that we might want to have the charge model tags being children tags of Electrostatics, which will essentially have 3 levels of indentation instead of 2.

But I think the problem is not limited to multiple tags defining one force. For exampl, in the future, the vdW tag could support multiple potential forms. We'll then have 1 very big vdW parameter handler (and force) class supporting multiple potentials and exposing different kinds of parameters. We should probably start a separate discussion about how we should handle this, however. ~Will open another issue later~ (done, see #343).

I think it might still be useful to keep this separation

Sounds great to me, and we can always extend System to this capability later in case, although I think it will make things harder when dealing with a slice of the System. I think, we'll also need to think about how we'll want to represent periodicity. Right now, I think we're "hacking" the presence/absence of box vectors in the Topology object.

An optional argument to system.create_potential_energy() could specify whether it should be rendered for, say, jax, numpy, or TensorFlow.

I agree. That is essentially my design, only get_potential_energy(backend='jax') would return a differentiable object instead of returning a function. I see now that the name is confusing and we should have two different methods, one returning only a Quantity and the other returning the potential function (whether it's a python function or an object). Less surprising. I've changed my example above to reflect this.

We could specify a dict of imports for numpy and friends to make it more extensible. I wonder if it's possible to rewrite a function line-by-line to change the origin of imports using introspection tools

If I understand this correctly, this might be tricker. For example, JAX wants you to use functional numpy programming while Dask uses the array's methods so in general I don't think we can have force objects give you 1 numpy implementation that we can reuse for accelerating/differentiating the code with different libraries. I think the backends should be able to import what they need internally anyway.

j-wags commented 5 years ago

I've been quiet in this thread, but I wanted to chime in and say that I'm mostly following along here and all of this seems reasonable -- I think you all have a better understanding of the constraints and requirements here.

With regard to "many-to-one and one-to-many relationships between SMIRNOFF sections and Force/Potential objects", I had a good discussion with @andrrizzi this morning that included this among other topics. It's not all relevant, but my full thoughts are linked in the 0.4.0 releasenotes reference above. comment link here

tl;dr -- the relevant part is that enclosing the charge-generating sections in the Electrostatics tag makes sense at first glance. However, it leaves the loose ends of "what if people want to include multiple potentials that are based on charges?" and "then where do we put the VirtualSites section, which represent particles that have both electrostatics and vdW interactions?". I think there are some parallels between this decision and section-bundling that make sense to consider at the same time.

andrrizzi commented 5 years ago

Thanks for your thoughts, Jeff! I've cross-posted that in #343 that I've split from this to discuss that issue.

davidlmobley commented 2 years ago

@mattwthompson I think this one should get closed; I think our plans in this direction are described by the Interchange effort and if we do anything else it wouldn't be for far longer.

mattwthompson commented 2 years ago

Thanks!