Open j-wags opened 4 years ago
Thanks for starting this issue @jwags. I have been thinking about this some more and if this is awkward to implement into the toolkit it could be something that is put into each component in the submission workflow. This would require the conformer generator and cmiles generation modules (and any others that require the OFFTK) to take a toolkit argument which would then be passed back to the OFFTK. This would make it easier to gather the information on which toolkit was called and should help in reproducing workflows as we have control over which backend toolkit is used.
If this does go into the toolkit maybe it makes more sense to not change the return signature of the method but to instead have each toolkit wrapper or the global toolkit registry keep a history of the toolkits and methods it calls and if they are successful or not. Then at the end of each component in the workflow, we can just grab the most recent item in the history for details on the function calls something like:
mol = Molecule.from_smiles('CC')
# clear the toolkit history
GLOBAL_TOOLKIT_REGISTRY.clear_history()
mol.to_smiles()
# check what function was called
GLOBAL_TOOLKIT_REGISTRY.history()
[{'toolkit': 'OpenEyeToolkitWrapper', 'version': '2019.Oct.2', 'method': 'to_smiles', 'error': None}]
My +1 is for the first approach, tracking the history of called methods in a way that's attached to the molecule. Perhaps there would is a way to log the history as its own data structure and provide each view? One for "what happened to this OFFMol?" and another for "what did this wrapper do?"
Should we first consider refactoring resolve
to be more delicate with how it calls methods? My guess is that we want to (and opinion that we probably should) but it may not be tractable to parse the possible exceptions raised by combinations of installed wrappers and available methods. We should avoid the empty except
in the middle of that function if at all possible.
I would really like this. For example, a student parameterized a lipid molecule, and we were unable to tell afterwards where the charges came from. Minimum a logfile printing out what decision it made, even if it's not stored internally . . .
Is your feature request related to a problem? Please describe. In discussing desired behaviors for the QCArchive submission framework, @jthorton and I determined that we'll want to record the provenance of which toolkit(s) performed each operation. This will need to be added to the Open Force Field toolkit, since we'll be using it for operations like conformer generation and CMILES generation.
Describe the solution you'd like We should add an option for ToolkitRegistry to report which toolkit and version performed an operation. We'll also need to add a method to each
ToolkitWrapper
to report its backend toolkit version in a standard way.One difficult thing here is that the
ToolkitRegistry.call
loops over multipleToolkitWrapper
s, trying to perform the requested operation using each, and will generally ignore anyToolkitWrapper
s that error (as long as one of them returns successfully). This means that the logic inresolve
(report the firstToolkitWrapper
that provides a correctly-named function) may not always report the toolkit that actually succeeds in performing the requested operation.The API for this new functionality is up for debate -- Some options I can think of would be:
offmol.generate_conformers(n_conformers=10, provenance=True)
generate_conformers
depending onprovenance
kwarg (returns either one or two objects)ToolkitWrapper
method needs to start supportingprovenance
kwarg, orToolkitRegistry
needs to skimprovenance
out of the list of kwargsToolkitRegistry.call_with_provenance(method_name, offmol, n_conformers=10)
Between these two options, I'm partial to the first (though it's not clear that these are exclusive). I'm open to other ideas as well!