openforcefield / open-forcefield-tools

Tools for open forcefield development
MIT License
8 stars 6 forks source link

Implement Bayly desired tools for parameter assessment on molecule set/trajectories of molecule set #5

Closed davidlmobley closed 6 years ago

davidlmobley commented 8 years ago

@camizanette - as I mentioned, Christopher Bayly ( @cbayly13 ) suggested some tools for examining how often we are using particular parameters (specified by smarts/smirks) in our molecule test set, or (once we run gas phase simulations of the molecules in the set or otherwise obtain energies/bond lengths/etc for them) examining the energy contributions from those parameters. Here's what he said he wants:

The idea here is to find out what is happening for a particular parameter, which I think would be represented by a smarts/smirks and a parameter type, for example ('[OX2][CX4:2][(#1:1])[OX2:3][!#1]', 'parAng') for a H-Csp3-Oether bond angle where the Csp3 has another divalent oxygen bonded to it as well. That would be one input. Another input would be a (potentially multi-conformer) OEMol. Now there are different cases for what we would like to get back:

Case A A list of tuples of the form [ ((idx1,idx2,idx3),value), ((idx1,idx2,idx2),value) ], being the atom idx's of the first occurrence, with its value of the bond angle. Then the second occurrence, etc. If there are no occurrences, an empty list is returned.

Case B With an additional input of a parameter file (or a parameterized object for that molecule...faster), we would get back a list of tuples of the form [ ((idx1,idx2,idx3), value, energy, 1stDeriv), ((idx1,idx2,idx2), value, energy, 1stDeriv) ], being the atom idx's of the first occurrence, the value of the bond angle, the energy, and the 1st deriv with respect to the angle. Then the second occurrence, etc. If there are no occurrences, an empty list is returned.

I think this would give us maximum bang for buck, and have lots of future utility; it would be great if your group wanted to do it. With these two functions, we can then accrete the information we need to quickly build the composite metric we are looking for, eg:

  1. compare particular parameter values, angles, 1stDerivs between two molecules (eg query minima with canonical minima). The residual could be part of the objective function for the fit.
  2. histogram particular parameter values, angles, 1stDerivs over the whole set of minima (even over a set of molecules). From the 1stDerivs vs value distributions we might find one population (no parameter split indicated) or two (parameter split indicated). or more than two (Aaaagh!).

Additionally we would need the OEDepict function to take a tuple of (idx1,idx2,idx3) such as those output from the function and highlight it in the molecule depiction in an iPython notebook. If that is done I could quickly bake it into a 20-molecules-per-page pdf for faster review. This shows us which occurrence in which structures is giving the weird values we see in the distribution.

I don't actually know how to obtain derivatives of energies with respect to the angle at this point, so we can put that on hold for the moment. But, @camizanette , you should be able to go ahead and do Case A, which is just to figure out where a particular parameter (i.e. an angle parameter) is occurring in a specified molecule, and what the value is of the observable (i.e. the bond distance, angle, or torsion). As noted you would want this to handle multi-conformer molecules if needed (we can start with OEMols but would probably extend/modify to handle trajectories from OpenMM such as in netCDF, perhaps via MDTraj)

Information you may need:

Things you'll need to resolve before implementing (that I know of) -- generate a proposal and then come back with it:

There is also a request for a very related (but not identical) tool in a separate issue, so you may want to look at that one as well: https://github.com/open-forcefield-group/open-forcefield-tools/issues/4

Please ask questions as needed.

davidlmobley commented 6 years ago

We got tools that Chris needed here into openforcefield; closing.