materialsproject / maggma

MongoDB aggregation machine
https://materialsproject.github.io/maggma/
Other
38 stars 31 forks source link

Enhancement: more performant MemoryStore #830

Open rkingsbury opened 1 year ago

rkingsbury commented 1 year ago

The performance of MemoryStore, which is currently powered by mongomock, is relatively slow. This can particularly cause noticeable delays when connecting to a FileStore, which uses MemoryStore internally. It would be great to find an alternative to mongomock that is more performant.

I have begun working on this and will keep this issue updated with my findings.

Possible Alternatives

Notes so far

montydb

See https://github.com/davidlatwe/montydb/issues/14

mongita

mongita does not support many query operations including $regex or $exists. It also doesn't support bulk_write or estimated_document_count although those can be worked around.

pymongo-inmemory

Not tested yet, but looks quite promising because it actually uses mongo (as opposed to mocking it) See #846

rkingsbury commented 1 year ago

Flagging @arosen93 in case this intersects with thoughts in #828

Andrew-S-Rosen commented 1 year ago

Thanks! Makes sense!!

I'm curious how one uses the file based stores in production though if multiple processes accessing the DB at the same time will cause issues (as is the case for MontyDB and Mongita). I guess these solutions really are just meant for the single-job, serial use case?

rkingsbury commented 1 year ago

Thanks! Makes sense!!

I'm curious how one uses the file based stores in production though if multiple processes accessing the DB at the same time will cause issues (as is the case for MontyDB and Mongita). I guess these solutions really are just meant for the single-job, serial use case?

TBH I don't think anyone has used the file-based Store enough to have encountered this problem yet!

Andrew-S-Rosen commented 1 year ago

Also my assumption 😄 To uncharted territory we go!