openforcefield / open-forcefield-tools

Tools for open forcefield development
MIT License
8 stars 6 forks source link

Proposed API for manipulating SMIRNOFF parameter sets #24

Open jchodera opened 7 years ago

jchodera commented 7 years ago

@maxentile and @bmanubay :

Do we need some more Pythonic way of manipulating SMIRNOFF parameter sets to facilitate MCMC sampling from the posterior? Currently, the API presumes you have fully-formed XML files (or some representation of them, such as lxml.etree.ElementTree). If you'd rather we find some way to bolt on a more Python interface so you don't have to manipulate XML elements directly when sampling parameters, let's discuss that here.

davidlmobley commented 7 years ago

More pythonic -- you mean like this? https://github.com/open-forcefield-group/openforcefield/blob/master/openforcefield/typing/engines/smirnoff/forcefield.py#L694 and https://github.com/open-forcefield-group/openforcefield/blob/master/openforcefield/typing/engines/smirnoff/forcefield.py#L644 ? (getParameter and setParameter)?

davidlmobley commented 7 years ago

(It is still working with the underlying XML, of course, but it IS a python interface.)

jchodera commented 7 years ago

That's more Pythonic, but it may not be what they need. For example, if they want to expose a view of a subset of parameters as an object that acts like a Python list for the purposes of feeding it to some MCMC sampling scheme, that's something we could engineer that would make their lives easier.

bmanubay commented 7 years ago

I've been using what David and his lab put into smarty/openforcefield.typing.engines for altering force field parameters for quite some time and it's been working out well for me thus far! I'll defer to @maxentile to see if he might want something different though.

What I've been doing is feeding OrderDict 's of SMIRKS types, parameters and proposals to David's tools in order to do what I need to do, but that could be made easier for sure.

jchodera commented 7 years ago

What I've been doing is feeding OrderDict 's of SMIRKS types, parameters and proposals to David's tools in order to do what I need to do, but that could be made easier for sure.

Could you point to an example of this?

I'm thinking of something like, for example:

# Create initial parameter set
from openforcefield.typing.engines.smirnoff import ForceField
forcefield = ForceField('initial.ffxml')
# Find all the Lennard-Jones parameters by SMIRKS string
# There are likely other ways to make this Pythonic
parameter_ids = [ parameter.id for parameter in forcefield.parameters['NonbondedForce'] ]
# Define objective for optimization
def objective(x):
    relative_error = 0.0
    forcefield.setParameters(parameters, x)
    # Compute error
    computed_properties = estimator.computeProperties(dataset, forcefield)
    for (computed, measured) in (computed_properties, dataset):
        relative_error += ((computed.value - measured.value) / measured.value)**2
    return relative_error
# Optimize
from scipy.optimize import minimize
x0 = parameters.as_numpy(parameter_ids) # copy out values as a numpy array
result = minimize(objective, x0)
optimal_parameters = result.x

We could potentially even allow direct access via [], like

# Create an array-like view by slicing parameter_ids
x = forcefield[parameter_ids]
# Update parameters from numpy values
x[0] += delta
forcefield[parameter_ids] = x
bmanubay commented 7 years ago

Ooo, I like that second one a lot! A sliceable array would be really nice.

jchodera commented 7 years ago

Would forcefield[parameter_ids] it be a unit-bearing simtk.unit.Quantity? Or would it be unitless (in openmm default units? in some other unit system?)

jchodera commented 7 years ago

Also, would you want to access by parameter ID(s)? By smirks? By parameter index?

bmanubay commented 7 years ago

Would forcefield[parameter_ids] it be a unit-bearing simtk.unit.Quantity? Or would it be unitless (in openmm default units? in some other unit system?)

I guess it would just be in openmm default units? I don't see a reason to work in other units per se.

Also, would you want to access by parameter ID(s)? By smirks? By parameter index?

Could we access by either ID or SMIRKS (if it wouldn't be too much trouble to support both)? Thereby if we sliced by SMIRKS or ID it would return some kind of array object of the parameters for that SMIRKS string. Something like:

# Return array like object of parameters associated with given parameter ID or SMIRKS string
x = forcefield[ID or SMIRKS]
# Update parameters from numpy values
x[0] += delta
forcefield[parameter_ids] = x
davidlmobley commented 7 years ago

I can comment in more detail tomorrow, but I want to chime in in favor of accessing both by SMIRKS and by ID.

This is obviously not as important when doing things automatically by MC, but if a human is doing anything with these, one wants to be able to access the data both ways. It's cumbersome typing SMIRKS patterns in if you want to access a single parameter where you know both the SMIRKS and parameter ID, and parameter IDs are much more concise if you are doing things like looking at which parameters are used in a particular molecule or molecule set, etc. (They make better axis labels, for example!) At the same time, you also want to be able to access by SMIRKS, because the parameter IDs are arbitrary and carry no information content in their own right; they're just labels. So, depending on what/how you're doing things, you end up wanting to be able to access in both ways.

davidlmobley commented 7 years ago

Would forcefield[parameter_ids] it be a unit-bearing simtk.unit.Quantity? Or would it be unitless (in openmm default units? in some other unit system?)

Naively it seems to me like we would want these to be unit bearing (simtk.unit.Quantity); otherwise, having the XML format bear units itself will seem pointless if the API for working with it does not bear units. If we pick a unit system here then we lose the benefits of having the XML format flexible with respect to units, I think.

Could we access by either ID or SMIRKS (if it wouldn't be too much trouble to support both)? Thereby if we sliced by SMIRKS or ID it would return some kind of array object of the parameters for that SMIRKS string. Something like:

# Return array like object of parameters associated with given parameter ID or SMIRKS string
x = forcefield[ID or SMIRKS]

I do like this, though I don't know offhand if that particular way of slicing is possible (seems like one would have to check the thing you're slicing for to find out of it is a parameter ID or a SMIRKS, which is somewhat un-pythonic and cumbersome). Maybe you could allow direct access via parameter ID and slightly less-direct access by SMIRKS, such as:

# Return array like object of parameters associated with given parameter ID or SMIRKS string
x = forcefield[parameter_ids]
y = forcefield.by_smirks(smirkslist)
jchodera commented 7 years ago

I guess it would just be in openmm default units? I don't see a reason to work in other units per se.

SMIRNOFF permits properties in each block to be specified in any desired units. They're automatically converted to OpenMM natural units internally, though.

Could we access by either ID or SMIRKS (if it wouldn't be too much trouble to support both)? Thereby if we sliced by SMIRKS or ID it would return some kind of array object of the parameters for that SMIRKS string.

That's what I was thinking. As long as IDs and SMIRKS strings are guaranteed to be unique, I think that should be OK.

Naively it seems to me like we would want these to be unit bearing (simtk.unit.Quantity); otherwise, having the XML format bear units itself will seem pointless if the API for working with it does not bear units. If we pick a unit system here then we lose the benefits of having the XML format flexible with respect to units, I think.

There may be a way to have it both ways: forcefield[ID or SMIRKS] could work with unit-bearing quantities, while forcefield.unitless_view(parameters) could return a unitless view that behaves like a numpy array but where changes to the array cause changes to the underlying parameters.

I do like this, though I don't know offhand if that particular way of slicing is possible (seems like one would have to check the thing you're slicing for to find out of it is a parameter ID or a SMIRKS, which is somewhat un-pythonic and cumbersome).

All that would happen under the hood: For each element in the iterable thing put inside the brackets, we would see if we could find a unique parameter that it corresponds to (either SMIRKS or ID), and if so, add that to the list we return. Otherwise, we return an Exception.

davidlmobley commented 7 years ago

OK, this all sounds great to me. I like the unitless_view(parameters) option for having units both ways.