openforcefield / open-forcefield-tools

Tools for open forcefield development
MIT License
8 stars 6 forks source link

Associative SMIRFF tools we need #4

Closed davidlmobley closed 8 years ago

davidlmobley commented 8 years ago

From @cbayly13 via e-mail, relating somewhat to https://github.com/open-forcefield-group/open-forcefield-data/issues/9#issuecomment-224354917 :

It has occurred to me that there are two kinds of association-map tools that will probably be pretty important, and a third tool that may be also important.

The two association-map tools (or single tool with two outputs) would take two inputs:

  • a list of molecules (only the graph would be used) and
  • a SMIRFF which has a unique parameter ID for each parameter.

The two kinds of associative outputs are:

  1. For each molecule, a list of the parameter IDs for all the parameters used by the molecule.
  2. For each parameter ID, a list of all the molecules that use that parameter.

The third tool would address a "hierarchy" issue: in our earlier discussions, we thought we would keep >track of the parent parameter when we generate a child parameter, so we could know the provenance >of a parameter. I think this will become important. So what happens when we destroy a parent >parameter? Will all the children parameters become orphans with a lost provenance? I am thinking we >will need a harmonizing tool so that, imagining a grandparent->parent->child parameter set, if the parent >gets destroyed the grandparent becomes the new parent of the child. In chemistry parlance, it would be >as if the destroyed parent was simply an unstable intermediate between the grandparent and the child.

I am thinking that it is obvious why we would need these tools but if it isn't I could expand on it (though it >would take a while).

One week to go! I am really looking forward to this...

davidlmobley commented 8 years ago

I responded:

The two association-map tools (or single tool with two outputs) would take two inputs:

  • a list of molecules (only the graph would be used) and
  • a SMIRFF which has a unique parameter ID for each parameter.

The two kinds of associative outputs are:

  1. For each molecule, a list of the parameter IDs for all the parameters used by the molecule.
  2. For each parameter ID, a list of all the molecules that use that parameter.

These indeed sound useful and important.

My group can probably code these up easily enough, though we have to decide what the right input format would be. I would think it would be the SMIRFF XML files and a set of molecules in SDF format (as we'd decided on that here for the canonical format). Does that sound right?

The third tool would address a "hierarchy" issue: in our earlier discussions, we thought we would keep >track of the parent parameter when we generate a child parameter, so we could know the provenance >of a parameter. I think this will become important. So what happens when we destroy a parent >parameter? Will all the children parameters become orphans with a lost provenance? I am thinking we >will need a harmonizing tool so that, imagining a grandparent->parent->child parameter set, if the parent >gets destroyed the grandparent becomes the new parent of the child. In chemistry parlance, it would be >as if the destroyed parent was simply an unstable intermediate between the grandparent and the child.

I think this makes sense as well. John may have thoughts from a tools/API perspective, but it seems like this is going to be vital for you to figure out how to do sensible moves and to understand what is happening with the moves you're doing, etc.

I'm not sure this is really a tool, though - maybe this is more of a "what info we need to track in the Bayesian chemical perception framework" than a separate tool.

davidlmobley commented 8 years ago

Resolved by https://github.com/open-forcefield-group/smarty/pull/82; see smarty/examples/label_molecule/get_parameter_statistics.py.