Have ToolkitRegistry report which toolkit ran operations

j-wags commented 4 years ago

Is your feature request related to a problem? Please describe. In discussing desired behaviors for the QCArchive submission framework, @jthorton and I determined that we'll want to record the provenance of which toolkit(s) performed each operation. This will need to be added to the Open Force Field toolkit, since we'll be using it for operations like conformer generation and CMILES generation.

Describe the solution you'd like We should add an option for ToolkitRegistry to report which toolkit and version performed an operation. We'll also need to add a method to each ToolkitWrapper to report its backend toolkit version in a standard way.

One difficult thing here is that the ToolkitRegistry.call loops over multiple ToolkitWrappers, trying to perform the requested operation using each, and will generally ignore any ToolkitWrappers that error (as long as one of them returns successfully). This means that the logic in resolve (report the first ToolkitWrapper that provides a correctly-named function) may not always report the toolkit that actually succeeds in performing the requested operation.

The API for this new functionality is up for debate -- Some options I can think of would be:

offmol.generate_conformers(n_conformers=10, provenance=True)
- Pro: easy and obvious to use
- Con: Changes return signature of generate_conformers depending on provenance kwarg (returns either one or two objects)
- Con: Either every ToolkitWrapper method needs to start supporting provenance kwarg, or ToolkitRegistry needs to skim provenance out of the list of kwargs
ToolkitRegistry.call_with_provenance(method_name, offmol, n_conformers=10)
- Pro: Unambiguous return signature
- Con: Clunky API call

Between these two options, I'm partial to the first (though it's not clear that these are exclusive). I'm open to other ideas as well!

jthorton commented 4 years ago

Thanks for starting this issue @jwags. I have been thinking about this some more and if this is awkward to implement into the toolkit it could be something that is put into each component in the submission workflow. This would require the conformer generator and cmiles generation modules (and any others that require the OFFTK) to take a toolkit argument which would then be passed back to the OFFTK. This would make it easier to gather the information on which toolkit was called and should help in reproducing workflows as we have control over which backend toolkit is used.

If this does go into the toolkit maybe it makes more sense to not change the return signature of the method but to instead have each toolkit wrapper or the global toolkit registry keep a history of the toolkits and methods it calls and if they are successful or not. Then at the end of each component in the workflow, we can just grab the most recent item in the history for details on the function calls something like:

mol = Molecule.from_smiles('CC')
# clear the toolkit history
GLOBAL_TOOLKIT_REGISTRY.clear_history()
mol.to_smiles()
# check what function was called
GLOBAL_TOOLKIT_REGISTRY.history()

[{'toolkit': 'OpenEyeToolkitWrapper', 'version': '2019.Oct.2', 'method': 'to_smiles', 'error': None}]

mattwthompson commented 4 years ago

My +1 is for the first approach, tracking the history of called methods in a way that's attached to the molecule. Perhaps there would is a way to log the history as its own data structure and provide each view? One for "what happened to this OFFMol?" and another for "what did this wrapper do?"

Should we first consider refactoring resolve to be more delicate with how it calls methods? My guess is that we want to (and opinion that we probably should) but it may not be tractable to parse the possible exceptions raised by combinations of installed wrappers and available methods. We should avoid the empty except in the middle of that function if at all possible.

mrshirts commented 1 year ago

I would really like this. For example, a student parameterized a lipid molecule, and we were unable to tell afterwards where the charges came from. Minimum a logfile printing out what decision it made, even if it's not stored internally . . .

openforcefield / openff-toolkit

Have ToolkitRegistry report which toolkit ran operations #540