lucaong / cubdb

Elixir embedded key/value database
Apache License 2.0
556 stars 23 forks source link

Zero cost immutable snapshots of the database #47

Closed lucaong closed 2 years ago

lucaong commented 2 years ago

A snapshot can be obtained with CubDB.snapshot/2, and represents the state of the database in the instant it was taken. Taking a snapshot is almost zero cost: nothing needs to be copied or written to disk or memory (save some small bookkeeping).

After taking a snapshot, it is possible to perform read operations on it, with the usual get, get_multi, fetch, select, has_key?.

The only real cost of a snapshot is that some cleanup operations normally performed after a compaction have to wait as long as a snapshot is still in use. This is why snapshots have by default a timeout (by default 5 seconds), after which they cannot be used anymore. It is possible to specify a timeout of :infinity, but then one has to manually call CubDB.release_snapshot(snapshot) to release resources. Alternatively, the CubDB.with_snapshot(db, fun) function helps with this by avoiding having to set an arbitrary timeout, and by releasing the snapshot automatically after the given function returns.

Example

A snapshot is used to read one or more entries as they were at the moment the snapshot was taken, without considering any later write:

# Set :a to a value
CubDB.put(db, :a, 123)

# Take a snapshot
snap = CubDB.snapshot(db)

# Overwrite :a after the snapshot was taken
CubDB.put(db, :a, 0)

# Getting a value from the snapshot returns the value of the entry at the
# time the snapshot was obtained, even if the entry has changed in the
# meanwhile
CubDB.get(snap, :a)
# => 123

# Getting the same value from the live database returns the latest value
CubDB.get(db, :a)
# => 0

Assume that we have two entries in the database, and the key of the second entry depends on the value of the first (so the value of the first entry "points" to the other entry). In this case, we might want to get both entries from the same snapshot, to avoid inconsistencies. Here's how that can be done with with_snapshot/2:

{x, y} = CubDB.with_snapshot(db, fn snap ->
  x = CubDB.get(snap, :x)
  y = CubDB.get(snap, x)
  {x, y}
end)

This also provides a solution to https://github.com/lucaong/cubdb/issues/27.