Open Andrew-S-Rosen opened 1 year ago
I like this idea
FYI: Here is what happens when two processes try to write to a montystore
at the same time. It looks like montydb
has a locking mechanism, but it doesn't support concurrent processes.
I had started some work to replace mongomock
with actual mongodb in MemoryStore
(see #846 ). Since JSONStore
is backed by MemoryStore
, I wonder whether doing this could also address the locking issue?
We have had success using JSONStore
to run atomate2
workflows in low throughput, but I'm sure we would encounter a similar problem in high throughput.
As discussed in #828, most file-based database packages (including MontyDB in the already-implemented
MontyStore
) do not have any built-in protection against multiple Python processes (or threads) reading/writing to the same database at the same time. This makes them useful only for serial calculations and less suitable for high-throughput settings where the odds of a collision are very high.Rather than relying on the external package to implement a file-locking system, we should introduce a file-locking mechanism within maggma that can be applied to all file-based data stores. py-filelock and portalocker are both good platform-agnostic options, with the former perhaps being slightly more active. There are built-in locking features in the MP
monty
package, but in my opinion we are better off using a battle-tested solution since they are usually light on the dependencies anyway (and the lock mechanism used in fireworks often caused headaches...).I'm jotting this down so that I don't forget. I don't have plans to work on this right now, but I will likely need to implement it one day in the future.