Closed dineshbvadhia closed 2 years ago
This is an interesting question with lots and lots of caveats and subtle interactions.
In a sense, this devolves into "How do I file I/O" in async? In most environments, you just take the hit and do I/O inline. For a long time, the philosophy behind file I/O was that it wasn't worth the overhead of some deferred scheme since most of the time the overhead would be greater than the increase in throughput. Inasmuch, there wasn't even a decent async file I/O API in Linux until very recently. AFAIK, there's no non-thread-based async file I/O solution in Python right now.
Consistent with this philosophy, one major (fully async) downstream user of py-lmdb just does lmdb operations in-line. Most of the time, that works fine. Occasionally, we'd run into corner cases where it would take 30 seconds to close a file handle. This is problematic when run from the main async thread. Linux with its default settings can do very strange things when deferred I/O starts to back up. (You can look at the recommended Linux settings in the documentation for ways to avoid this).
The other approach is to have an lmdb worker thread. For every LMDB operation, reading or writing, you post a command to a queue that the worker thread burns down. That means every operation has at least 2 context switches added to its overhead. The async thread will usually have to wait on the LMDB thread (for example, to get the results from a read), but it can do that in a async-friendly manner.
Still, if you do run into the situation where close() takes 30s, your LMDB thread will be effectively blocked, but the rest of your app would still run, but most of the coroutines would be waiting on the LMDB thread, essentially a sort of live-lock situation. Your app might still respond to a status request (as long as it doesn't look anything up in the DB), but it still wouldn't be able to do anything useful until the OS gets its act together.
Thank-you for the thoughtful response. Since writing the note, I can mix sync (including lmdb) and async in my service and it works well (so far). From my limited exposure to Python async, it seems fully async code is predominantly the domain of library builders whereas application/service developers mix sync and async to fit their needs.
Btw, Thank-you for continuing with py-lmdb - it is a brilliant piece of code.
@dineshbvadhia The project mentioned is https://github.com/vertexproject/synapse and the documentation regarding Linux IO settings can be found here https://synapse.docs.vertex.link/en/latest/synapse/devguides/devops_general.html#tips-for-better-performance
I just came across this issue after investigating how to use lmdb in an asyncio application. Since an asyncio event loop maintains its own ThreadPoolExecutor, you could potentially farm out db operations to the event loop's thread pool.
Would something like this be a reasonable approach? The caveat here is that everything within the transaction context needs to be run in the separate thread, but it would keep the event loop thread unblocked, and it should gracefully handle async tasks opening transactions concurrently, since you'd only have one transaction at a time in an executor thread.
async db_example():
env = lmdb.open("db.mdb")
def update_db():
with env.begin(write=True) as txn:
txn.put("key1", "val1")
txn.put("key2", "val2")
await get_running_loop().run_in_executor(None, update_db)
Just wanted to mention asyncio
does use real asyncio via https://github.com/mosquito/caio,
it only falls back to threaded async if that is not available (kernel < 4.18)
Read https://github.com/jnwatson/py-lmdb/issues/86 and wondering what the latest, if any is on this. The question is:
When writing a 'fully' async Python application, what is the best way to handle calls to py-lmdb?