prashnts / pybloomfiltermmap3

Fast Python Bloom Filter using Mmap
https://github.com/prashnts/pybloomfiltermmap3
MIT License
130 stars 24 forks source link

Multiple processes on same mmap file #38

Open leopd opened 4 years ago

leopd commented 4 years ago

Apologies that I think this is really just a Linux system question, but what happens if I have multiple processes using the same mmap bloomfilter? If they're only reading, everything should be wonderful and efficient, right? But what if one or more writes changes? I'm guessing/hoping that kernel will force each process to use the same memory pages that any changes caused by one process will instantly appear for the others? One problem being that the writes can't be atomic, so as one process is writing the different hash values, the other processes will see partial results, which for a bloom filter is probably fine in practice.

prashnts commented 4 years ago

Sounds like a very interesting question still!

First off, I don't think I can point out something from your description as odd without digging more into it, as it sounds like how I imagine linux systems to be like, where everything works as intended, and everyone is happy. I'd appreciate links to further info or anything relevant for specifics regarding: Indeed, what will happen? Or maybe it's super well defined area and I just don't know it. (also it's getting late...).

However on another note: when i'm dealing with multiple processes which "may want to write simultaneously", i usually go for some sort of write locking, and back-off thing. If redis is available and relevant, I quickly reach there and use redis for "one more thing we use it for". Practically this has prevented me from having to find out about your scenario!

karolinepauls commented 3 years ago

A couple of years ago we used the predecessor to this library (pybloomfiltermmap with Python 2) in a highly concurrent way, with multiple reader processes, and the filters themselves synchronised by an external tool.

It worked.

You are right about the lack of atomicity not being a problem.