a primer on lmdb handles handles (& relationships inbetween)

lmdb employs several kinds of handles. our very first is the environment handle. it's just a data structure, created by mdb_env_open. this is like our root/first handle on the lmdb data file. this handle would need to be kept around as long as our crud needs are present, anywhere in the database. ideally, this one handle, is dealing data in only a single data file. best practice: 1) keep it alive across the entire process. and 2) open one environment only once per process (because within the process, the handle can anyway be shared around) \ \ \ \ \ \ \ \ Q) now. what do we use an environment handle for? A) an env handle, is used in the creation of a transaction handle, via

mdb_txn_begin(
  txn's-env-handle,
  txn's-parent-txn-ifAny,
  if-this-txn-is-readonly,
  address-to-put-the-new-MDB_txn-handle-at
)

could we instantiate several transaction handles with the same env handle? what could it mean to have several txn handles, being fed the same env handle at construction? could it mean that whoever is using that txn handle to look at the db, will see a consistent snapshot of the data, until the txn is freed. and will the database spend resources to maintain those many consistent views?

say i get a txn, i'm looking at it, and i don't close it for a long time. the database undergoes mutations on some/all of the keys i'm looking at. a copy-on-write is made, and my old snapshot is also active and consuming space in the mmap. this old snapshot tree could be reclaimed if i released/freed the handle. but i'm not. i don't know if and when i'll release it. then another such thing happens sometime down the line. something took a txn, but hasn't released it since a long time now. maybe it releases the txn in the future, maybe not. another stale snapshot is being maintained in the mmap for some indefinite time. so this way, disk usage can grow rapidly.

could it just mean, that these several transaction handles, have an address to refer to, in the process of actually getting to the data? plausible.

could it just mean, that the transactions just know where to go, (ie, via the environment handle,) when they have to actually get to the data?

within a transaction, the individual mutations don't seem to have some of the guarantees that transactions have from the outside.

lmdb's design, requires the creation and disposal of several txn handles, across an app's lifecycle. transaction handles come and go. so we can tell something about our application, just by looking at the trend of transaction handle count, their duration, age, etc.

Q) what do we use a transaction handle for? A) a txn handle, is used

in the creation of a database handle via mdb_dbi_open
to keep open, a consistent view of the data

mdb_dbi_open(
  the-txnHandle-for-the-wouldBe-dbiHandle,
  name-of-db-to-open-if-not-null,
  special-options-for-this-dbi,
  address-to-put-the-new-MDB_dbi-handle-at
)

mdb_dbi_open

must not be called from multiple concurrent transactions in the same process. A transaction that uses this function must finish (either commit or abort) before any other transaction in the process may use this function.

nono

- interestingly it is

typedef unsigned int MDB_dbi unlike
struct MDB_env or
struct MDB_txn

dbi is the final abstraction, through which, we can actually mdb_get and mdb_put

conceptually, a transaction provides us a consistent view on the data. so, we use the view to actually look at the data. now it's intuitive. a transaction handle guarantees to give me a consistent view on the data, so as long as the view is open, I'll have the same consistent view of the data, doesn't matter, if the same keys i'm looking at, have undergone mutation in the while, i held the view on them.

but where is this data?

we haven't yet talked about how/where the data is stored. now, comes the database handle.

the address of any datum, in an lmdb database, is a product datatype. all information required to address the place, needs to be present. eg if we look at all the tiniest crud operations mdb_get or mdb_put or mdb_del, we find all need a txn-handle and a dbi-handle

the database handle, is one such term in the address. we have to have an MDB_dbi, the database handle, to reference a key in the database. and mdb_dbi_open is our path to a database handle.

factually,

now actually, one transaction gives us consistent view of the data, and the view we have, might be composed of data from multiple dbis. maybe it's what our app requires. so, it follows, that, there must be some way to link a txn handle, with these dbi handles, precisely because, under one txn handle, we want to have a consistent view that spans multiple dbis.

now, another constraint:

A transaction must only be used by one thread at a time.

how?

when we say one thread, we mean, effectively one executing instance of a method we wrote in our app's source code. what we don't see, is that our runtime's concurrency system might have multiple instances of that method, executing concurrently, and now, if the method we wrote, ends up mutating, some global memory, then we have concurrency problems.

transaction handles, are one such piece of memory, available in the app's memory space, that can be theoretically shared between multiple executing methods. but lmdb docs forbids so.

if anyhow, a transaction handle ends up being used by, eg, two methods executing concurrently, lmdb will allow both

concurrent instances of the same method, lmdb will allow both the instances to work upon the same stored data, and treat the mutations invoked by these concurrent methods, as belonging to the same transaction. but is being used by more than one threads at a time, lmdb will open the transaction to both the concurrently executing threads, so effectively, two threads are working on the same transaction. there will be non determinism. eg, we can't determine which thread will freeup the transaction handle, and which thread will get the exception.

sidnt / lmdz

a primer on lmdb handles handles (& relationships inbetween) #34

nono