[WIP] Add property calculation best practices information to the repo; comments appreciated.

bmanubay commented 8 years ago

These are my initial ideas for best practices of property calculation from simulation data. Main things that need to be sussed out still are:

Finishing converting isochoric pressure coefficient (needed in speed of sound calculation) into a directly calculable quantity (could potentially just calculate (partial_P/partial_T)_V directly by simple finite differences)
Activity coefficient makes use of a reference chemical potential (mu{_i^0}) in it's theoretical definition and I'm not sure exactly what to do regarding translating that into something calculable in MD. Any ideas would be appreciated!

jchodera commented 8 years ago

Thanks! Will review this.

Two quick thoughts:

We don't need to rely on finite differences, since we can compute analytical derivatives instead (which will be numerically more stable and easier to compute uncertainties for)
We don't have atomic or molecular virials in OpenMM for the computation of pressures, so avoiding virial-based pressure computation is ideal

bmanubay commented 8 years ago

@jchodera Thanks John!

jchodera commented 8 years ago

@davidlmobley has been computing activity coefficients and can chime in on the computability issue.

jchodera commented 8 years ago

Also, note that I accidentally pushed to the main repo here instead of to my fork. I am going to roll that back so I can open a PR to discuss the API README, so don't do a merge from master yet!

jchodera commented 8 years ago

Some comments on this:

A.1.1 Density: Why would we want to estimate the density via Eq. 5, using a finite difference of the Gibbs free energy reweighted to different pressures when it is much more direct to use MBAR to estimate the density directly? In order to compute the Gibbs free energy at different pressures, we would need to store both the potential energies and volumes frequently---so we would have all of the data to estimate the density at arbitrary thermodynamic states using MBAR directly? I don't see how Eq. 5 would reduce the variance beyond the MBAR-estimated density---it would only serve to introduce both variance (from estimating two related quantities) and bias (from the use of finite difference).
A.1.2 Dielectric constant: There are multiple methods for estimating the dielectric constant, and I fear that there are some issues (such as whether PME is used, and which PME boundary conditions are acceptable) that impact the computation. The total dipole moment M is also not invariant with translation of the box center, is it? If you could provide more guidance here, this would be useful! For reference, the code used in the previous ThermoML benchmark is here.
A.1.3 Isothermal compressibility: We won't have access to dV/dP analytically, though it could potentially be accessed via finite-difference in P. The fluctuation in V at constant NTP sounds like our best bet.
A.1.4 Molar enthalpy: We should be able to compute dG/d\beta analytically via MBAR, but it's important to remember that computing derivatives of the dimensionless quantity g = (\beta G) is usually harder to screw up, so it's best to talk about explicit derivatives of the dimensioness quantities.
A.1.5 Heat capacity: There are also fluctuation forms of this. I'm not sure if there will be any major difference between those based on var(H) or those based on dH/dT, but there could be added complexity in second derivatives via MBAR.

davidlmobley commented 8 years ago

@cfennell may also be able to provide useful input on this, as he's recently been calculating a variety of properties including dielectric constants. Chris?

davidlmobley commented 8 years ago

Re activity coefficients: We were actually originally computing relative activity coefficient (same solute, different solvents) which is easier - basically just solvation free energy calculations. So, it's not trivial, but it's pretty straightforward. I think computing actual activity coefficients rather than ratios will be a bit harder but not hard, it will just require someone ironing out a robust procedure that takes things to a good reference state. So it's a research problem, but a small one, I think.

jchodera commented 8 years ago

We should also run this document by Bill Swope (IBM Almaden) after it matures a bit.

bmanubay commented 8 years ago

Great! Thanks for all the feedback folks!

cfennell commented 8 years ago

Thanks for the ping on this, David. Some thoughts to build on John's points, though I'm certain I'm behind in knowing the details of what you want or what is important here. There are a lot of properties that one could calculate, but not all of them are terribly unique and helpful when it comes to optimizing a force field. The Caleman et al. (dx.doi.org/10.1021/ct200731v ) study seemed to zero in on the ones John listed, with the addition of surface tension and the coefficient of thermal expansion. I might consider viscosity a helpful one for a lot of things as well.

For standard practices in calculating some of these quantities, you can look to the Horn et al. TIP4P-Ew paper (10.1063/1.1683075) or Steve Rick's TIP5P-E paper (10.1063/1.1652434). I fretted a lot about the static dielectric constant as John suggests, going back to the older Neumann papers, but it really isn't that bad. If you use the Ewald sum with conducting boundary conditions (don't apply any non-standard surface dipole correction), it's consistent regardless of your chosen system size. Note that the more polar the molecule, the longer it will take the static dielectric constant to converge. I tend to go around 20 ns sampling runs for things near the polarity of water. You will often need to run longer trajectories for things more polar than water.

You guys may find interesting a discussion I had with Vijay and Lee-Ping Wang on a "minimal # of experimental properties to get a transferable force field". I posed it as a problem for ForceBalance in hopes of getting down to what really matters. My position was that some properties are more dependent upon certain parameters than others, and focusing on the properties that matter most can give you others "for free". For example, the density is highly dependent on the LJ sigma parameter (obviously) while another property like ∆H_vap is not strongly dependent on LJ sigma. If the only parameter in your force field is LJ sigma, getting density correct has a lot more value for eventual transferability than getting ∆H_vap correct. Lee-Ping went with the not too adventuresome choice of "more than 2". My thought was to just use the most highly correlated property for each parameter, and that is all that should be necessary. In my case at the time, it was density (for LJ sigma), ∆H_vap (for LJ epsilon), and static dielectric constant (for charge magnitude). In the proposed Bayesian framework, maybe this isn't as interesting an issue as relative importance of properties should get sorted automagically?

davidlmobley commented 8 years ago

@cfennell - we're focusing initially on properties which are available to us extremely easily via ThermoML, with some prioritization based on what we think will be important. Our tentative list/prioritization is here: https://github.com/open-forcefield-group/open-forcefield-tools/wiki/Physical-Properties-for-Calculation

Here, I think what would be probably most helpful would be your input on the ease/difficulty of calculating the various properties Bryce has described calculating, and whether (if you have any experience) they should be calculated in different ways than he's described (along the lines of John's comments).

We also welcome feedback on what properties to calculate, though INITIALLY that will be a little more dictated by the data we have easily available - i.e., this summer, we're basically trying to just pick one or two properties (starting with density) and "turn the Bayesian crank" to show how those change parameters, then repeat and include a different property. After that we can worry more about what types of properties provide orthogonal versus overlapping information.

mrshirts commented 8 years ago

Starting to go through the comments here. Apologies for the delay (forgot to turn notifications on + limited work time since quite under the weather)

We don't need to rely on finite differences, since we can compute analytical derivatives instead (which will be numerically more stable and easier to compute uncertainties for)

I am imagining finite difference in P and T using MBAR, not running multiple simulations. In my experience with MBAR, finite differences end up giving something almost identical to the analytical formula, including uncertainties. Will follow up on this. Generally am not advocating doing finite difference with multiple simulations (propagated error is too large when using small differences).

We don't have atomic or molecular virials in OpenMM for the computation of pressures, so avoiding virial-based pressure computation is ideal

Good to know, we'll keep that in mind.

mrshirts commented 8 years ago

A.1.1 Density: Why would we want to estimate the density via Eq. 5, using a finite difference of the Gibbs free energy reweighted to different pressures when it is much more direct to use MBAR to estimate the density directly? In order to compute the Gibbs free energy at different pressures, we would need to store both the potential energies and volumes frequently---so we would have all of the data to estimate the density at arbitrary thermodynamic states using MBAR directly? I don't see how Eq. 5 would reduce the variance beyond the MBAR-estimated density---it would only serve to introduce both variance (from estimating two related quantities) and bias (from the use of finite difference).

Good question. One reason to consider using a finite difference of the free energy for the volume is that it doesn't require two steps -- first calculate free energies, then calculate observables, since the extra terms are calculated at the same time as the free energies. If the number of states is large, extra observable calculation can be expensive. If small, the extra cost os negligible. So any advantage is likely to be slight, so I just wanted it there for completeness. We wanted to lay out all the things we could do, and then choose between them.

A.1.2 Dielectric constant: There are multiple methods for estimating the dielectric constant, and I fear that there are some issues (such as whether PME is used, and which PME boundary conditions are acceptable) that impact the computation. The total dipole moment M is also not invariant with translation of the box center, is it? If you could provide more guidance here, this would be useful! For reference, the code used in the previous ThermoML benchmark is here.

We'll have to investigate further -- looks like chris fennell has commented here, we'll follow up.

A.1.3 Isothermal compressibility: We won't have access to dV/dP analytically, though it could potentially be accessed via finite-difference in P. The fluctuation in V at constant NTP sounds like our best bet.

We'll try both fluctation and MBAR based finite difference. With playing around with T-derivatives, I expect they will be almost the same (though uncertainty with finite difference MBAR easier to calculate, since the uncertainty in the uncertainty is hard to get right).

A.1.4 Molar enthalpy: We should be able to compute dG/d\beta analytically via MBAR, but it's important to remember that computing derivatives of the dimensionless quantity g = (\beta G) is usually harder to screw up, so it's best to talk about explicit derivatives of the dimensioness quantities.

There's a lot of screwy things involved with temperature derivatives of free energies. I will try to get this updated with some of that detail.

A.1.5 Heat capacity: There are also fluctuation forms of this. I'm not sure if there will be any major difference between those based on var(H) or those based on dH/dT, but there could be added complexity in second derivatives via MBAR.

Yes, we'll get the fluctuation forms in -- I'd like to ideally get both finite difference and fluctuation forms in in most cases, until we know how different they are (because of the exponential in the boltzmann factor, there are almost alway derivatives version and fluctuation versions of every formula). I have some data I'll coordinate with Bryce to talk about.

mrshirts commented 8 years ago

My thought was to just use the most highly correlated property for each parameter, and that is all that should be necessary. In my case at the time, it was density (for LJ sigma), ∆H_vap (for LJ epsilon), and static dielectric constant (for charge magnitude). In the proposed Bayesian framework, maybe this isn't as interesting an issue as relative importance of properties should get sorted automagically?

Once we have the parameter sensitivities, we should be able to do PCA and figure out which combinations will give us maxima information.

We would like to avoid (for now) surface tension (because it's a mixed-phase property, depends on liquid and vapor) and viscosity, because it's a transport quantity, and will change depending on what thermostat one uses (dependence can be minimized with the right choice, but adds an extra degree of complexity). As noted above, we have the list of data it's relatively easy to get experimentally.

jchodera commented 8 years ago

I am imagining finite difference in P and T using MBAR, not running multiple simulations. In my experience with MBAR, finite differences end up giving something almost identical to the analytical formula, including uncertainties. Will follow up on this. Generally am not advocating doing finite difference with multiple simulations (propagated error is too large when using small differences).

I think I'm just missing the point of WHY you are advocating the use of finite differences for MBAR when analytical derivatives could be computed. Is it just to avoid the infrastructure overhead for computing parameter derivatives in potential energy or computing MBAR derivatives?

If we did finite difference, how can we automatically select difference intervals to avoid numerical error?

mrshirts commented 8 years ago

I think I'm just missing the point of WHY you are advocating the use of finite differences for MBAR when analytical derivatives could be computed.

So, I just want to lay out what could be done, not necessarily say in all cases what should be done.

The basic idea is to avoid an extra call to MBAR for observables, which we have found can sometimes be slow if evaluating a lot of them. But that's very situation dependent -- usually, you the observable will be fine.

Is it just to avoid the infrastructure overhead for computing parameter derivatives in potential energy or computing MBAR derivatives?

No, this is for T and P derivatives, which can be done analytically (i.e. the the thermodynamics relationships described in the sheet) w/o any need to change the potential energy function.

mrshirts commented 8 years ago

If we did finite difference, how can we automatically select difference intervals to avoid numerical error?

With MBAR, you can just go down as finely as you want in relative error. We've played around with this before.

Analytical derivatives of the parameter are a different topic, of course.

mrshirts commented 8 years ago

I've posted some changes in the fork https://github.com/mrshirts/open-forcefield-tools/ for Bryce to incorporate and merge in -- I wasn't sure how to add a pull request to a pull request . . . Still a few things to take care of.

Note that the examples directory in pymbar has a heat capacity example that uses the fluctuation formula, the derivative of enthalpy formula, and the second derivative of free energy formula. Bryce, you can take a look at that to check your derivations on the heat capacity (which I think still needed the fluctuation formula added).

jchodera commented 8 years ago

I wasn't sure how to add a pull request to a pull request

This is pretty easy---you just select different source and destination forks when opening the PR on GitHub.

jchodera commented 8 years ago

With MBAR, you can just go down as finely as you want in relative error. We've played around with this before.

There are well-known numerical issues with finite-difference derivatives, even with double precision. It's essentially to have numerical guidelines for achieving robust derivatives.

jchodera commented 8 years ago

So, I just want to lay out what could be done, not necessarily say in all cases what should be done.

This is meant to be the "best practices for computing observables", not "some random practices that could be used to compute observables"!

The basic idea is to avoid an extra call to MBAR for observables, which we have found can sometimes be slow if evaluating a lot of them. But that's very situation dependent -- usually, you the observable will be fine.

By "extra call to MBAR for observables", I'm very confused. The free energies are already converged---we just need to compute the derivative of some observable, and no self-consistent iteration is required. In fact, only a single call is required to compute observable derivatives if analytical derivatives are used, while two calls are required to compute finite-difference derivatives.

I'm still completely confused why you guys are advocating finite-difference derivatives over analytical derivatives. Can you explain in some useful detail why? I can think of some reasons, but I am totally not clear on what your scientific motivation is here.

mrshirts commented 8 years ago

This is meant to be the "best practices for computing observables", not "some random practices that could be used to compute observables"!

I don't think the best practices are yet known. Making sure which ones are requires listing out the options and investigating them. Also, implementing multiple methods adds redundancy.

I'm still completely confused why you guys are advocating finite-difference derivatives over analytical derivatives

Perhaps because we are not yet advocating for them yet? A student (now not in the group) played around with this before, and we have some evidence, but I need to revive it first. And I've got some evidence from entropy calculations that I need to dig out as well (last worked on a couple of years ago, so needs to be reviewed). My conclusion was that they all were about the same, but that because of the way MBAR was implemented, it was faster to do finite differences if you had to do 100's of them. FOR NOW, any of these methods will work good enough, so if you want to ignore finite difference until there's a bit more evidence, that's totally fine.

By the end of the day, I'll at least clean up the heat capacity example in pymbar/examples -- that shows some of the features (not the speed, since there's not many calculations, but the accuracy). Other examples might take longer.

I don't really feel like arguing more it until I dig out the files and make sure they are sharable (because if they shareable aren't, people will then complain that the files are a mess). Maybe I misremembered?. Until then, I'm going to lay out the possibilities so they are documented. Pick any one you want (for most properties), it should work fine for small scales. And if it doesn't, there are other ones

One more thing can we clarify some terminology that might be leading to misunderstanding? When you say analytical derivatives, what are the variables you are taking the derivatives analytically? There are derivatives WRT to thermodynamic variables, like T and P, which lead to different formulas (like the fluctuation formulas) and there are parameter derivatives, which I'm not getting into yet at all. Which are you referring to?

On Sat, Jun 4, 2016 at 3:22 PM, John Chodera notifications@github.com wrote:

So, I just want to lay out what could be done, not necessarily say in all cases what should be done.

This is meant to be the "best practices for computing observables", not "some random practices that could be used to compute observables"!

The basic idea is to avoid an extra call to MBAR for observables, which we have found can sometimes be slow if evaluating a lot of them. But that's very situation dependent -- usually, you the observable will be fine.

By "extra call to MBAR for observables", I'm very confused. The free energies are already converged---we just need to compute the derivative of some observable, and no self-consistent iteration is required. In fact, only a single call is required to compute observable derivatives if analytical derivatives are used, while two calls are required to compute finite-difference derivatives.

I'm still completely confused why you guys are advocating finite-difference derivatives over analytical derivatives. Can you explain in some useful detail why? I can think of some reasons, but I am totally not clear on what your scientific motivation is here.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/open-forcefield-group/open-forcefield-tools/pull/2#issuecomment-223779009, or mute the thread https://github.com/notifications/unsubscribe/AEE31AbyCSr0HFttPezr7x_XSCefPNDTks5qIeyNgaJpZM4IrIS8 .

jchodera commented 8 years ago

Thanks for the clarification, @mrshirts! I hadn't realized that you guys were just compiling some possible ways we could compute properties, rather than a list of what you were endorsing as "best practices".

I think we can learn a lot from the work of Chris Fennell @cfennell, Lee-Ping Wang @leeping, and BIll Swope and the IBM Almaden Research folks, since they have invested a great deal of time in identifying the best practices here. Those approaches (e.g. those used for TIP4P-Ew and adopted by @leeping) should probably be our first-line approaches, substituted for another approach when we run into a roadblock (such as the lack of virials in OpenMM). We can certainly compile alternative approaches, but maybe we can list them as "Alternative approaches" and designate one approach (and a citation where it was used) as a tentative "best practice".

Thanks again for working on compiling these!

mrshirts commented 8 years ago

I just did some updating to the pymbar/examples/heat-capacity example; it was purely to try to clean up the code a bit and add a bit more commenting, though there's still a ways to go. Numerically it was unchanged from before.

It calculates the heat capacity three ways:

Fluctuation result
First derivative of energy
2nd derivative of free energy

With an initial range of 85 degrees and a dT of 85/200, then we get (at the C_V peak, to be specific) T Cv +/- dCv (var) Cv +/- dCv (dT) Cv +/- dCv (ddT) 318.015 19.1742 +/- 0.2297 19.1527 +/- 0.2291 19.1634 +/- 0.2294

This is a potential bias of about 0.1%. Note the errors above are bootstrap - the analytical error estimates are within 10-15% for first derivatives, but we haven't got good formulas yet for the variance and the 2nd derivative uncertainties. Exactly the same data was used for all calculations.

With spacing of 85/1000, at the closest temperature (there's an offset because of the difference spacing), we get (omitting the bootstrap errors since I didn't calculate them here) T Cv (var) Cv (dT) Cv(ddT) 318.033 19.1744 19.1736 19.1740

Which is about as good agreement as one can get. In this case, the variance-based method is the one in best agreement as the interval drops.

Looking at how this compares over a range of temperatures and for different properties is probably going to have to fall to Bryce!

jchodera commented 8 years ago

What is "var" vs "dT" vs "ddT"?

Numerically, these results are all different. While they are within statistical error for this example, IS THIS ALWAYS GUARANTEED?

I have had a great deal of difficulty with the numerical stability of finite difference derivatives in validating forces for molecular mechanics potentials. It is not straightforward to select an appropriate Delta for numerical derivatives that is numerically robust. Without some guidance on the numerics of selecting an appropriate Delta, I am hesitant to use finite difference methods when analytical approaches are available.

If the derivatives are respect to thermodynamic parameters where we don't need potential energy derivatives, I don't see the additional value in finite difference. On the other hand, it would be a huge engineering effort to, say, compute the derivative of the the potential wrt volume, so cases where it is unavoidable make sense.

mrshirts commented 8 years ago

The descriptions of the formula are in the script.

"var" - variance( i.e. fluctuation based) formula.

"dT" - first derivative of the energy

"dTT" - second derivative of the free energy (with some other factors).

I have had a great deal of difficulty with the numerical stability of finite difference derivatives in validating forces for molecular mechanics potentials. It is not straightforward to select an appropriate Delta for numerical derivatives that is numerically robust.

Forces are significantly different than observables, since forces have things like r-12 dependence. Observables (except at a phase change, which we won't really have since we have microscopic extent, where phase changes are still continuous) usually change much more smoothly, usually with low-order polynomial dependence. The heat capacity example is actually looking at the assembly transition of a number of peptides (in implicit solvent), so there is quite a large spike in heat capacity.

Without some guidance on the numerics of selecting an appropriate Delta, I am hesitant to use finite difference methods when analytical approaches are available.

And by analytical, you mean the variance formula? That's the form of analytical approaches when looking at derivatives of thermodynamic variables. Or sometimes you get other thermodynamic variables.

If the derivatives are respect to thermodynamic parameters where we don't need potential energy derivatives, I don't see the additional value in finite difference. On the other hand, it would be a huge engineering effort to, say, compute the derivative of the the potential wrt volume, so cases where it is unavoidable make sense.

I always like to calculate things multiple ways, so I know when I've screwed up :) And it's easier to get an analytical estimate of the uncertainty for the finite difference versions than the fluctuation version (compare the analytical error estimate to bootstrap for the four estimators), though that could potentially be improved with some work.

On Tue, Jun 7, 2016 at 9:45 AM, John Chodera notifications@github.com wrote:

What is "var" vs "dT" vs "ddT"?

Numerically, these results are all different. While they are within statistical error for this example, IS THIS ALWAYS GUARANTEED?

I have had a great deal of difficulty with the numerical stability of finite difference derivatives in validating forces for molecular mechanics potentials. It is not straightforward to select an appropriate Delta for numerical derivatives that is numerically robust. Without some guidance on the numerics of selecting an appropriate Delta, I am hesitant to use finite difference methods when analytical approaches are available.

If the derivatives are respect to thermodynamic parameters where we don't need potential energy derivatives, I don't see the additional value in finite difference. On the other hand, it would be a huge engineering effort to, say, compute the derivative of the the potential wrt volume, so cases where it is unavoidable make sense.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-forcefield-group/open-forcefield-tools/pull/2#issuecomment-224322946, or mute the thread https://github.com/notifications/unsubscribe/AEE31KgQtui0gu7nwxzNUDMZutkam4fwks5qJZISgaJpZM4IrIS8 .

davidlmobley commented 8 years ago

I'd highly suggest @bmanubay and @mrshirts set up a call with Bill Swope to talk through this in general - with a particular focus on things we are interested in the relatively near term, but then if you have time you can get to the longer-term stuff as well.

Depending on scheduling, I might be interested in/able to join. But Bill has done so many different things SO CAREFULLY that you really want to get his input on this.

davidlmobley commented 8 years ago

Reading his papers is no substitute for talking to him, as many things are understated in his papers, and many more never make it to print.

jchodera commented 8 years ago

+100 on @davidlmobley's suggestion.

Would love to join if possible.

bmanubay commented 8 years ago

Sounds like a good idea!! I'm not sure how this would work into @mrshirts schedule since he's traveling soon, but I'd be happy to take part if I'm able.

mrshirts commented 8 years ago

Definitely a good idea. For testing things (say, density and heat capacity), it won't matter so much, since heat capacity corrections are pretty small, BUT by the time we are doing anything seriously (i.e. anytime beyond just get something up and working the next few weeks), we'll want to pick his brain.

One potential time for a call is the second day in UCI (June 12th), or if not, sometime in the week or so after. I can email him to set that up (David, perhaps you can link me on any current email chain?)

davidlmobley commented 8 years ago

I think this is in the "you guys should schedule a call" category, @bmanubay , but it is more of a long-term issue since the first place we're going is density and we thankfully know how to calculate that efficiently. :)

davidlmobley commented 8 years ago

@mrshirts :

Definitely a good idea. For testing things (say, density and heat capacity), it won't matter so much, since heat capacity corrections are pretty small, BUT by the time we are doing anything seriously (i.e. anytime beyond just get something up and working the next few weeks), we'll want to pick his brain.

I'm not just talking about corrections. I'm imagining he's calculated more thermodynamic properties with or without corrections in his career than the rest of us, combined, have ever even thought of calculating. ;)

One potential time for a call is the second day in UCI (June 12th), or if not, sometime in the week or so after. I can email him to set that up (David, perhaps you can link me on any current email chain?)

I don't have a suitable current e-mail chain, easiest to just start a fresh one to discuss this.

mrshirts commented 8 years ago

I'm not just talking about corrections. I'm imagining he's calculated more thermodynamic properties with or without corrections in his career than the rest of us, combined, have ever even thought of calculating.

Definitely. I'm just saying that we can be pretty confident we can get something that we can be confident is a reasonable estimate heat capacity before we talk to him.

davidlmobley commented 8 years ago

I'm just saying that we can be pretty confident we can get something that we can be confident is a reasonable estimate heat capacity before we talk to him.

Yes, hence my "long term" comment above a moment ago.

davidlmobley commented 8 years ago

I read the document and passed on to @bmanubay a bunch of minor tweaks that aren't worth burdening the whole group with via Slack.

@bmanubay : The one other thing I think this document would benefit a lot from is some commentary on/insight into which properties constrain what parameters and to what degree, even if it's just speculative/based on physical intuition. This probably is something you can even just pick people's brains about, i.e. along the lines of what Chris Fennell wrote:

You guys may find interesting a discussion I had with Vijay and Lee-Ping Wang on a "minimal # of experimental properties to get a transferable force field". I posed it as a problem for ForceBalance in hopes of getting down to what really matters. My position was that some properties are more dependent upon certain parameters than others, and focusing on the properties that matter most can give you others "for free". For example, the density is highly dependent on the LJ sigma parameter (obviously) while another property like ∆H_vap is not strongly dependent on LJ sigma. If the only parameter in your force field is LJ sigma, getting density correct has a lot more value for eventual transferability than getting ∆H_vap correct. Lee-Ping went with the not too adventuresome choice of "more than 2". My thought was to just use the most highly correlated property for each parameter, and that is all that should be necessary. In my case at the time, it was density (for LJ sigma), ∆H_vap (for LJ epsilon), and static dielectric constant (for charge magnitude). In the proposed Bayesian framework, maybe this isn't as interesting an issue as relative importance of properties should get sorted automagically?

So, maybe take a stab at that, then perhaps even have a really brief call with, say, Chris to talk through it? (He IS interested in being involved at least at a high level, so I would expect he'd be willing to talk.) I'm happy to join if you guys ( @mrshirts and @bmanubay) are scheduling one.

cfennell commented 8 years ago

Indeed, I would be interested in chatting about this and other things. Keep me posted on a time and I will see if I can join in. I still need to sort out traveling to California. I could possibly be there when something gets scheduled.

On Jun 29, 2016, at 6:59 PM, David L. Mobley notifications@github.com wrote:

I read the document and passed on to @bmanubay a bunch of minor tweaks that aren't worth burdening the whole group with via Slack.

@bmanubay : The one other thing I think this document would benefit a lot from is some commentary on/insight into which properties constrain what parameters and to what degree, even if it's just speculative/based on physical intuition. This probably is something you can even just pick people's brains about, i.e. along the lines of what Chris Fennell wrote:

You guys may find interesting a discussion I had with Vijay and Lee-Ping Wang on a "minimal # of experimental properties to get a transferable force field". I posed it as a problem for ForceBalance in hopes of getting down to what really matters. My position was that some properties are more dependent upon certain parameters than others, and focusing on the properties that matter most can give you others "for free". For example, the density is highly dependent on the LJ sigma parameter (obviously) while another property like ∆H_vap is not strongly dependent on LJ sigma. If the only parameter in your force field is LJ sigma, getting density correct has a lot more value for eventual transferability than getting ∆H_vap correct. Lee-Ping went with the not too adventuresome choice of "more than 2". My thought was to just use the most highly correlated property for each parameter, and that is all that should be necessary. In my case at the time, it was density (for LJ sigma), ∆H_vap (for LJ epsilon), and static dielectric constant (for charge magnitude). In the proposed Bayesian framework, maybe this isn't as interesting an issue as relative importance of properties should get sorted automagically?

So, maybe take a stab at that, then perhaps even have a really brief call with, say, Chris to talk through it? (He IS interested in being involved at least at a high level, so I would expect he'd be willing to talk.) I'm happy to join if you guys ( @mrshirts and @bmanubay) are scheduling one.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

leeping commented 8 years ago

This hypothesis is definitely something you could test in ForceBalance - you could zero out the weights for properties you don't want to match, and also switch specific parameters on or off. From my experience with developing water models, I think there's a fair amount of coupling between parameters and properties in this problem.

jchodera commented 8 years ago

Thanks!

Can we remove the LaTeX intermediate files, like these?

Property-Calculation-Best-Practices/prop_calc_best_practices.log
Property-Calculation-Best-Practices/prop_calc_best_practices.aux

mrshirts commented 8 years ago

The response function of d(property)/d(parameter) is probably we would want to look at, which can be done very efficiently with finite difference (if comparing magnitude, the approximation resulting from finite difference is negligible)

bmanubay commented 8 years ago

Took care of the intermediate files @jchodera!

jchodera commented 8 years ago

Thanks, @bmanubay!

jchodera commented 8 years ago

Will merge this now. Let me make a few edits today, and you can carry on in a separate PR tomorrow.

openforcefield / open-forcefield-tools

[WIP] Add property calculation best practices information to the repo; comments appreciated. #2