openforcefield / open-forcefield-tools

Tools for open forcefield development
MIT License
8 stars 6 forks source link

Draft API/usage example for full example workflow for Bayesian sampling in parameter space #6

Open davidlmobley opened 8 years ago

davidlmobley commented 8 years ago

We need to follow in the footsteps of #3 and generate what the API and a usage example for our full workflow for the immediate parameter sampling problem we are heading towards. This involves, as I understand it, taking a specified set of data, a starting set of parameters, and a prior, and generating proposed modulations to those parameters, computing properties, comparing with experiment, updating the posterior, etc.

Presumably this would start by generating some sample code around that from https://github.com/jchodera/open-forcefield-tools/blob/master/README.md to expand the example by doing things like:

So for example, how would we expand this code below to do the above?

thermoml_keys = ['10.1016/j.jct.2005.03.012', ...]
dataset = ThermoMLDataset(thermoml_keys)
# Filter the dataset to include only molar heat capacities measured between 280-350 K
dataset.filter(ePropName='Excess molar enthalpy (molar enthalpy of mixing), kJ/mol') # filter to retain only this property name
dataset.filter(VariableType='eTemperature', min=280*unit.kelvin, max=350*kelvin) # retain only measurements with `eTemperature` in specified range
# Load an initial parameter set
parameter_set = [ SMIRFFParameterSet('smarty-initial.xml') ]
# Compute physical properties for these measurements
estimator = PropertyEstimator(nworkers=10) # NOTE: multiple backends will be supported in the future
computed_properties = estimator.computeProperties(dataset, parameter_set)
# Write out statistics about errors in computed properties
for (computed, measured) in (computed_properties, dataset):
   property_unit = measured.value.unit
   print('%24s : experiment %8.3f (%.3f) | calculated %8.3f (%.3f) %s' % (measured.value / property_unit, measured.uncertainty / property_unit, computed.value / property_unit, computed.uncertainty / property_unit, str(property_unit))
davidlmobley commented 8 years ago

In today's call, @jchodera said @maxentile, @pgrinaway and @jchodera would draft this up and I should create an issue for it. :)

davidlmobley commented 8 years ago

@mrshirts - you might have input on this as well... ?

mrshirts commented 8 years ago

We probably want to handle the units more transparently than in the print line given at the end. Though I units should be able to handle that anyway IIRC.

Other than starting with density as they example, this seems reasonable.

jchodera commented 8 years ago

We probably want to handle the units more transparently than in the print line given at the end. Though I units should be able to handle that anyway IIRC.

Can you elaborate?

I was planning on using simtk.unit.Quantity() objects to represent measurements. These have a value and a unit attached, and take care of standard unit conversions for you.

These lines

   property_unit = measured.value.unit
   print('%24s : experiment %8.3f (%.3f) | calculated %8.3f (%.3f) %s' % (measured.value / property_unit, measured.uncertainty / property_unit, computed.value / property_unit, computed.uncertainty / property_unit, str(property_unit))

are just a convenient idiom for printing the measured and computed values in the same units.

You can just as easily use print(measured) and it will give you something like 4.73242 kilojoules/mole, but there is no guarantee about how many digits it will print or whether print(computed) will give you the same units.

jchodera commented 8 years ago

Other than starting with density as they example,

Do you mean "we should use density as the simplest example"?

mrshirts commented 8 years ago

Yes, that should have been "use density as the example" -- and yes, the first one.

mrshirts commented 8 years ago

Potentially a print wrapper that would take care of the unit formatting. Maybe that's too much of a pain. I was thinking of rather than dividing, using built in functions like

measured.value._value (for short) or

measured.value.value_in_units(unit(measured.value))

to be more explicit about what the conversion was doing. But all pretty minor things.

(measured.value / property_unit, measured.uncertainty / property_unit, computed.value / property_unit, computed.uncertainty / property_unit, str(property_unit))

jchodera commented 8 years ago

Potentially a print wrapper that would take care of the unit formatting. Maybe that's too much of a pain. I was thinking of rather than dividing, using built in functions like.. to be more explicit about what the conversion was doing. But all pretty minor things.

The simtk.unit package is mature and contains several thousand of lines of code. I'm not going to write or rewrite it---we're just going to use it here.

The alternative is to use pint, but it isn't as readily compatible with OpenMM, which we need right now.

The API example wasn't meant to show off any sort of pretty-printing. It's just to show you how you would access data.

davidlmobley commented 8 years ago

Yes, @mrshirts - have you used simtk.unit? I'm fairly sure it solves the problem you're talking about - or if not, you're going to have to be a good deal more explicit about the problem.

mrshirts commented 8 years ago

Yes, @mrshirts - have you used simtk.unit? I'm fairly sure it solves the problem you're talking about

Well, I've used it extensively throughout InterMol - I'm just pointing out that dividing by units is messier and less obvious than just using the existing functionality already in simtk.unit, and gave examples how it would be done using the existing functionality.

jchodera commented 8 years ago

You're right, we could use the value_in_units_of() functionality. That is probably clearer. I can update the docs.

davidlmobley commented 8 years ago

Thanks. @mrshirts, I think we both misunderstood what you were asking for.

On Mon, Jun 27, 2016, 10:31 PM John Chodera notifications@github.com wrote:

You're right, we could use the value_in_units_of() functionality. That is probably clearer. I can update the docs.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/open-forcefield-group/open-forcefield-tools/issues/6#issuecomment-228952330, or mute the thread https://github.com/notifications/unsubscribe/AGzUYdRVTDZJ0_UWkruQj0p7cuI2gOd4ks5qQLGrgaJpZM4I7RmB .

davidlmobley commented 8 years ago

As noted in today's call, we would be doing things like this:

Our initial tests will be on modification of bond, angle, and torsional parameters, because of the gas phase data we'll be testing on. We'll begin with bonds.

The procedure as JDC described will be something like this:

Initially we will have to work on the SMIRFF XML representation of the parameters in order to do this, until we sort out a Python representation of the parameters (see https://github.com/open-forcefield-group/smarty/issues/68).

davidlmobley commented 7 years ago

@maxentile - hopefully congratulations are in order. @jchodera was saying you would have advanced a couple days ago and would be wanting to get started on this and related issues right after advancement? I just wanted to bring it to the surface again.