For a dataset with many thousands of optimizations using a QCFractal server approach, export is currently done in series with much of the time spent serializing large JSON structures and writing to the filesystem. This should be entirely parallelizable, and is currently very slow.
To accomplish this, the openff.benchmark.geometry_optimization.compute.OptimizationExecutor.export_molecule_data method's innermost loop should be broken into a standalone staticmethod, then passed to a multiprocessing.ProcessPool executor with the optimization object and its id. The size of the ProcessPool should be configurable with a parameter and commandline flag.
For a dataset with many thousands of optimizations using a QCFractal server approach, export is currently done in series with much of the time spent serializing large JSON structures and writing to the filesystem. This should be entirely parallelizable, and is currently very slow.
To accomplish this, the
openff.benchmark.geometry_optimization.compute.OptimizationExecutor.export_molecule_data
method's innermost loop should be broken into a standalone staticmethod, then passed to amultiprocessing.ProcessPool
executor with the optimization object and its id. The size of theProcessPool
should be configurable with a parameter and commandline flag.