openforcefield / openff-bespokefit

Automated tools for the generation of bespoke SMIRNOFF format parameters for individual molecules.
https://docs.openforcefield.org/bespokefit
MIT License
61 stars 9 forks source link

Provide a simple path to retrieving cached datasets, or package them with BespokeFit #322

Open j-wags opened 9 months ago

j-wags commented 9 months ago

Description

Capturing an idea from https://github.com/openforcefield/openff-bespokefit/pull/321#issue-2138654940:

Should we provide a function to grab all relevant datasets automatically from QCArchive? Do we want to have to keep this up to date?

And my reply:

QCF is experimenting with some local caching stuff that I'm going to check out right after this. So if we're lucky we may have the option to save a local copy of the common datasets that we distribute in the bespokefit conda package(!!!)

jthorton commented 9 months ago

I guess we want a default list of the names of torsiondrive datasets that should be pulled from QCArchive and the local cache. We would then use the same interface as normal to pull down the datasets while pointing the client interface to the local copy of the cache which would speed up the process of populating the bespokefit cache. If the local cache is too big when pulling all torsiondrives we could make it a separate package and make it optional as it only benefits users who want to run using our standard DFT theory?