kriszyp / lmdb-js

Simple, efficient, ultra-fast, scalable data store wrapper for LMDB
Other
505 stars 41 forks source link

What are the exact semantics of asynchronous transactions? #242

Open joepie91 opened 1 year ago

joepie91 commented 1 year ago

The documentation states the following:

Also, the callback function can be an async function (or return a promise), but this is not recommended. If the function returns a promise, this will delay/defer the commit until the callback's promise is resolved. However, while waiting for the callback to finish, other code may execute operations that would end up in the current transaction and may result in a surprising order of operations, and long running transactions are generally discouraged since they extend the single write lock.

However, I'm having difficulty figuring out exactly what the semantics of this are. "other [...] operations that would end up in the current transaction" - does this mean that two simultaneous asynchronous (long-running) transactions might 'contaminate' each other, breaking the isolation property? Or does it just mean that operations outside of a transaction might erroneously end up inside of a transaction? Or something else entirely?

kriszyp commented 1 year ago

The latter. An asynchronous function in a transaction will block the next enqueued async transaction callback (until it is resolved). However, while the asynchronous callback is waiting to resolve, any calls to store.put() (whether are "inside" the callback or otherwise) are assumed to be part of that transaction (it is difficult to know otherwise; might be doable with AsyncLocalStorage, but that has too much overhead). Does that make sense?

joepie91 commented 1 year ago

Yep, that makes sense, thanks!

An asynchronous function in a transaction will block the next enqueued async transaction callback (until it is resolved).

Is this only the case for transactions that do writes, or for all asynchronous transactions? Are you expected to do read-only operations outside of transactions entirely?

it is difficult to know otherwise; might be doable with AsyncLocalStorage, but that has too much overhead

One approach I could think of would be to work with a "transaction object", like so:

foo.transaction((tx) => {
    tx.put("bar", "baz");
})

... however, the tradeoff of that would be needing to pass a transaction object through nested function calls that make use of it. This is already commonplace in other database libraries that do transactions (eg. Knex), but I don't know whether it'd be an acceptable tradeoff here.

kriszyp commented 1 year ago

This is the case for any use transaction which is a write transaction (although you aren't required to only do writes in it). So for example:

store.transaction(async () => {
console.log('a');
await somethingThatTakesAwhile();
console.log('b');
});
store.transaction(async () => {
console.log('c');
await somethingThatTakesAwhile();
console.log('d');
});
setTimeout(() => {
 store.put(key,value);
},100);

This would always print a,b,c,d in that order, but it would be unknown if the store.put() might be executed during one of the awaits and thus included in one of the transactions.

One approach I could think of would be to work with a "transaction object" I have certainly considered that, but there are some problems with that. First, the API would be much more complicated; your example assumes a single store, but many stores may participate in a transaction (as long as they are part of the same env database). And this wouldn't enable any type of "interleaving" of transactions. Using the transaction function means your callback is executed once the exclusive lock for a write transaction has started, and there can only be a single write transaction at a time, so writing to other transactions objects wouldn't make any sense.

I probably could add some option for a store.put() call to have an option to explicitly opt-out of participation in any current transaction (store.put(key, value, {onlyEnqueue: true}) or something).