openforcefield / alchemiscale

a high-throughput alchemical free energy execution system for use with HPC, cloud, bare metal, and Folding@Home
https://docs.alchemiscale.org/
MIT License
23 stars 8 forks source link

Logging simulation metadata on F@H #22

Open JenkeScheen opened 1 year ago

JenkeScheen commented 1 year ago

In broad terms, what are you trying to do? On F@H, log a collection of information on (finished) free energy calculations, related to https://github.com/openforcefield/fah-alchemy/issues/2. The idea is that given enough gathered information, knowledge could be distilled to create a well-informed atom mapping scorer that can extend the lomap-based and perses-based scorers in OFE (similar to this work). To this end, the following metadata have been outlined in previous discussions:

--> Action point: please add to these.

How do you believe using this project would help you to do this? We would need a large volume of data to be able to train such a model - FE calcs on F@H would be very suitable for this reason. Because FE calculation methods change over time, continuous feedback of simulation metadata would be extremely valuable when trying to evolve such a model as well.

What problems do you anticipate with using this project to achieve the above? Presumably not all edges being run on F@H are intended to be shared (publicly). We would only want to collect non-descriptive data (hence SMIRKS rather than SMARTS), but perhaps an opt-out option would be appropriate as well.

richardjgowers commented 1 year ago

I like the above list, "time" ought to mean both simulated time (ns) and number of decorrelations/fluctuations, but I think maybe that's what you mean by sampling time?

re: SMIRKS, it might also be nice to have SMILES of each end (for ligands), possible needing to be optional