materialsproject / fireworks

The Fireworks Workflow Management Repo.
https://materialsproject.github.io/fireworks
Other
351 stars 184 forks source link

[WIP] Added a MultiStore to DataServer #502

Open sivonxay opened 1 year ago

sivonxay commented 1 year ago

When running large numbers of calculations simultaneously, rlaunch multi takes care to only create one MongoClient instance and take advantage of connection pooling to prevent large numbers of concurrent connections to the mongodb.

However, when a user would like data to be saved to another database or collection, it is typically achieved by creating maggma Stores within a Firetask and inserting the document (for example jobflow's JobFiretask). This results in poor control over the number of connections made to a database and the potential for an unlimited number of connections.

To solve this, I have implemented a class which keeps track of the Stores in use and allows individual fireworks to share these Stores. This extends the DataServer used to share a LaunchPad, by registering an additional callable MultiStore. No other fireworks code utilizes this functionality, but it is required that the main process keeps track of it and therefore it cannot be put in another code.

Because I have put this class in maggma, this does require fireworks to have maggma as a dependency. I could also put this class in jobflow, but jobflow (and implicitly maggma) would need to be dependencies.

These are the pull requests for maggma and jobflow

janosh commented 1 year ago

maggma is not a default dep of Fireworks (yet) causing tests to fail:

CI error ```py /opt/conda/envs/test-environment/lib/python3.7/importlib/__init__.py:127: in import_module return _bootstrap._gcd_import(name[level:], package, level) fireworks/core/tests/test_tracker.py:20: in from fireworks.features.multi_launcher import launch_multiprocess fireworks/features/multi_launcher.py:23: in from maggma.stores.shared_stores import MultiStore E ModuleNotFoundError: No module named 'maggma' ```