keyEncoding impact on database size

jdesboeufs commented 1 year ago

Hi,

Thank for your amazing work on this library!

I noticed something strange about database size and page splitting on my specific use case, but then I tried to reproduce with some generic code.

For each case, I write 100k values in an empty database. These values are random bytes (binary value). In some case we have 16b buffers, in others 2kb buffers. Keys can be integers (uint32) or strings (ordered-binary, 8 chars or 12 chars).

Results (on macOS + Apple M1 Pro):

 * uint32-buffer16: 3.18 MB
 * str8-buffer16: 13 MB

 * uint32-buffer2048: 274 MB
 * str8-buffer2048: 379 MB
 * str12-buffer2048: 379 MB

 * mixed-buffer2048: 285 MB

I don't know how to explain that database size (and also page count) rise from 274MB to 379MB turning 100k keys from uint32 to string. It's huge!

The final case mix the two: buffers are indexed with uint32 and a second database link string keys to integers. And it works.

Where is my mistake? Why can't we just index buffers with string keys without this workaround?

My code:

import {randomBytes} from 'node:crypto'
import {setTimeout} from 'node:timers/promises'
import {stat, rm} from 'node:fs/promises'
import prettyBytes from 'pretty-bytes'
import LMDB from 'lmdb'
import {customAlphabet} from 'nanoid'

const nanoid = customAlphabet('1234567890ABCDEF', 10)
const loop = 100_000
const dbOptions = {commitDelay: 10}

let _writing = 0

async function slowDown() {
  if (_writing > 1000) {
    await setTimeout(10)
    await slowDown()
  }
}

async function writeToDb(db, key, value) {
  const putPromise = db.put(key, value)

  _writing++

  putPromise.then(() => {
    _writing--
  })

  await slowDown()
}

const cases = {
  'uint32-buffer16': async dbPath => {
    const db = LMDB.open(
      dbPath,
      {keyEncoding: 'uint32', encoding: 'binary', ...dbOptions}
    )

    for (let i = 1; i <= loop; i++) {
      await writeToDb(db, i, randomBytes(16))
    }

    await db.flushed
    await db.close()
  },

  'str8-buffer16': async dbPath => {
    const db = LMDB.open(
      dbPath,
      {encoding: 'binary', ...dbOptions}
    )

    for (let i = 1; i <= loop; i++) {
      await writeToDb(db, nanoid(8), randomBytes(16))
    }

    await db.flushed
    await db.close()
  },

  'uint32-buffer2048': async dbPath => {
    const db = LMDB.open(
      dbPath,
      {keyEncoding: 'uint32', encoding: 'binary', ...dbOptions}
    )

    for (let i = 1; i <= loop; i++) {
      await writeToDb(db, i, randomBytes(2048))
    }

    await db.flushed
    await db.close()
  },

  'str8-buffer2048': async dbPath => {
    const db = LMDB.open(
      dbPath,
      {encoding: 'binary', ...dbOptions}
    )

    for (let i = 1; i <= loop; i++) {
      await writeToDb(db, nanoid(8), randomBytes(2048))
    }

    await db.flushed
    await db.close()
  },

  'str12-buffer2048': async dbPath => {
    const db = LMDB.open(
      dbPath,
      {encoding: 'binary', ...dbOptions}
    )

    for (let i = 1; i <= loop; i++) {
      await writeToDb(db, nanoid(12), randomBytes(2048))
    }

    await db.flushed
    await db.close()
  },

  'mixed-buffer2048': async dbPath => {
    const db = LMDB.open(dbPath, {...dbOptions})

    const dataEnv = db.openDB('data', {keyEncoding: 'uint32', encoding: 'binary'})
    const idEnv = db.openDB('id')

    for (let i = 1; i <= loop; i++) {
      await writeToDb(dataEnv, i, randomBytes(2048))
      await writeToDb(idEnv, nanoid(12), i)
    }

    await dataEnv.flushed
    await idEnv.flushed
    await dataEnv.close()
    await idEnv.close()
    await db.close()
  }
}

for (const [testCase, handler] of Object.entries(cases)) {
  const dbPath = `./dist/${testCase}.lmdb`
  const dbLockPath = dbPath + '-lock'
  await rm(dbPath, {force: true})
  await rm(dbLockPath, {force: true})
  await handler(dbPath)
  const {size} = await stat(dbPath)
  console.log(` * ${testCase}: ${prettyBytes(size)}`)
}

kriszyp commented 1 year ago

I am guessing the difference is that the key and value combination is just enough to go above and below some threshold for triggering when leaf nodes are split. Note that generally b-tree works by keep pages between 50% to 100% full. Once a page goes over a 100%, it is split, and if it was just above 100% full, then the split will yield two pages that are just above 50% full.

Another thing to be aware of is that a size of 2048 bytes is just slightly too big to fit two values into a 4KB page (with the keys, they are slightly over 2KB). However, in your test, the default page size in MacOS on M-series processors is 16KB. I don't know if your application is targeting MacOS, but if you are planning on targeting other OSes, the 16KB is actually a very unusual page size; for the past couple decades almost all OSes have stuck to 4KB. And these tests yield somewhat different results on 4KB page sizes. When I run your tests locally without explicitly setting the page size to 16KB (to emulate MacOS M1 default), and use the system linux page size of 4KB I actually get larger DBs (again because every 2KB value requires its own page). On the other hand, if I use 2000 byte values with 4KB, I actually get smaller databases with the string keys than the uint32 keys. Again, it is difficult to predict when b-tree leafs and branches will split and/or merge.

I don't know what the nature of your app is, but if you are concerned about storage space, you could also consider enabling compression (can't really test it with this test case, since random bytes are basically uncompressible). Anyway, sorry I don't have a more definitive answer here.

jdesboeufs commented 1 year ago

Thank you for your answer. I understand well how it works under the hood! I definitely target Linux for production.

I'm working on storing geometries (parcels) and I use geobuf for serialization. The data size is very variable, from some bytes to some kilobytes, depending on parcel geometric complexity.

Results on a Linux machine (with default 4KB page size):

 * uint32-buffer16: 3.12 MB
 * str8-buffer16: 10.6 MB

 * uint32-buffer2048: 414 MB
 * str8-buffer2048: 422 MB

 * str12-buffer2048: 422 MB
 * mixed-buffer2048: 421 MB

kriszyp commented 1 year ago

I think I got these exact same results when I ran your tests on my computer with 2048 byte values. However, I get dramatically different results when I run with 2000 byte values. If your application will have variable data size, it might be worth testing with more randomized value sizes, as that might be more representative (and they might even "fit together" better on pages).

jdesboeufs commented 10 months ago

Thank you for your answer. Finally we accepted it takes more space on disk. And when we have to transfer the database, a simple gzip is very efficient to reduce the zeros :)

kriszyp / lmdb-js

keyEncoding impact on database size #223