[user story] Retrospective analysis of alchemical decisions

Not sure if it fully falls under the required "user stories", but we (i.e. OpenFE team) were discussing this internally after yesterday's meeting and thought it might be good to put it up. From recent discussions, I think it's something that's already being thought about so it might be good to formally put it here.

In broad terms, what are you trying to do?

Given the large amount of alchemical calculations which will go through F@H using this framework, there is a good opportunity for us to retrospectively analyze how well certain transformations perform.

From this we should be able to:

Refine scoring methods for atom mappings in relative transformations, eventually leading to more efficient transformation networks
Generate estimates for transformation accuracy (both relative to experiment and in terms of convergence) - allowing us to highlight potential problematic cases prior to simulations taking place

How do you believe using this project would help you to do this?

By its own existence, this project should generate large amounts of data which we can learn from.

Should we be able to expose both inputs (i.e. force field info, relevant binding site information, atom mappings) and outputs (i.e. simulation outcomes - e.g. dH timeseries, convergence metrics, work distributions, etc....) in a digestible manner, it should be reasonably simple for someone to gather this data and analyze it as required.

What problems do you anticipate with using this project to achieve the above?

Certain aspects of these metrics (e.g. timeseries data) can rather quickly grow to become large multi-TB datasets (ABFEs for ~ 70 ligands x 5 replicas is > 100 GB if you're not too careful about your print frequency).
I believe there was mention in yesterday's meeting that there should be a mechanism by which to remove unwanted simulations from the results store. If that's the case then there should be a mechanism to track entries which have been removed so that any downstream analysis can be aware of these changes and account for them.

tagging in @richardjgowers as an interested party

@IAlibay: This is great! Thanks so much for putting this together!

In my discussions with @dotsdl, one of the key ideas is to make sure we capture all the relevant metadata (in your case: force field, atom mapping strategy, etc) in an extensible data object, and to provide as output highly valuable datasets that capture all the critical information you need, with the ability to retrieve the raw data or trajectories as needed. For example, you might want to create a project where you execute a variety of transformation networks for different targets with different mapping or network planning strategies, or create a very large dense network that contains all of this in a single calculation. You should be able to retrieve the lightweight results data into a Python object and do exploratory data analysis without the need to collect and analyze primary data, in order to determine which strategies work best.

Many of the downstream users are considering building dashboards that provide different views of the data produced in a common object model---you could imagine doing the same to compare different releases (regression analysis) and different run options (to identify best practices or refine implementations) for an ever-expanding set of benchmark systems. You should still be able to retrieve targeted raw data (snapshots, trajectories, other output data) should you need to, or re-run targeted simulations locally if you need to explore failures.

extensible data object, and to provide as output highly valuable datasets that capture all the critical information you need, with the ability to retrieve the raw data or trajectories as needed.

This sounds super useful.

Assuming some campaigns may be removed over time (either due to obsolescence or just "this was completely wrong") - if possible, it would be great to make entries in these dataset objects immutable over time but have a way to retrospectively annotate them (e.g. trajectory data is no longer available + why, or maybe just "these data points are attached to this publication").

You should still be able to retrieve targeted raw data (snapshots, trajectories, other output data)

So I think something that is implicit here, but might be worth explicitly bringing up is the ability to do customized post-simulation processing. For example having a means for folks to do something like MM -> QM / QML bookending would be quite useful.

I personally don't think this really should exist within the F@H ecosystem (running QM or arbitrary ML code seems somewhat out of scope), but others might disagree here.

Raw notes from story review, shared here for visibility:

want to be able to refine scoring methods for atom mappings in relative transformations, leading to more efficient transformation networks
generate estimates for transformation accuracy, relative to experiment and in terms of convergence
- related to the idea John raised of multiple paths/edges for a given transformation (pair of end states)
  - perhaps "transformation" should be taken to mean a particular path between two endstates, where parallel edges are possible?
edges contain information on protocol, force field, hybrid system, etc., since this is the model/simulation aspect of the graph
concern about downstream data size
- will have to introduce tiering of storage for artifacts, including limits on what the system will accept in terms of snapshot frequency that is ultimately archived
  - limits must be configurable per "Project" via "Admin" role, with values settable up to those limits by "User" submissions
concern about removal
- provenance for removed simulations/entries should be designed into system; perhaps at the very least a log, akin to a git reflog?
  - perhaps each project gets a provenance log like this?
from Chodera: should be able to get at most answers for assessing how well things are working without the need to collect and analyze primary data

openforcefield / alchemiscale

[user story] Retrospective analysis of alchemical decisions #2