Logging simulation metadata on F@H

In broad terms, what are you trying to do? On F@H, log a collection of information on (finished) free energy calculations, related to https://github.com/openforcefield/fah-alchemy/issues/2. The idea is that given enough gathered information, knowledge could be distilled to create a well-informed atom mapping scorer that can extend the lomap-based and perses-based scorers in OFE (similar to this work). To this end, the following metadata have been outlined in previous discussions:

Transformation SMIRKS
Simulation length/ sampling time (should be coupled to FE protocol because e.g. sampling times in REPEX and NEQ are fundamentally different)
Rate of convergence
Error estimates (ideally across replicates)
FE protocol (with version tag?)
Force fields
Protein environment information (if not whole, then ligand-protein interactions?)
Details on solvent

--> Action point: please add to these.

How do you believe using this project would help you to do this? We would need a large volume of data to be able to train such a model - FE calcs on F@H would be very suitable for this reason. Because FE calculation methods change over time, continuous feedback of simulation metadata would be extremely valuable when trying to evolve such a model as well.

What problems do you anticipate with using this project to achieve the above? Presumably not all edges being run on F@H are intended to be shared (publicly). We would only want to collect non-descriptive data (hence SMIRKS rather than SMARTS), but perhaps an opt-out option would be appropriate as well.

openforcefield / alchemiscale

Logging simulation metadata on F@H #22