kriszyp / lmdb-js

Simple, efficient, ultra-fast, scalable data store wrapper for LMDB
Other
481 stars 39 forks source link

Default pageSize - 16384? #202

Closed ppedziwiatr closed 1 year ago

ppedziwiatr commented 1 year ago

Hey,

the docs say that the default page size is 4,096.

But when I started the lmdb-js with default page settings - basically with this configuration:

this.db = open<V, string>({
      path: `${cacheOptions.dbLocation}`,
      noSync: cacheOptions.inMemory
    });

the getStats return pageSize = 16384

image

Testing on MacOS (Apple Silicon) and 2.7.0 version of lmdb

So what is the default? 4096 or 16384?

ppedziwiatr commented 1 year ago

I guess I've found the answer - https://github.com/kriszyp/lmdb-js/issues/191#issuecomment-1262939773

I believe the docs should be updated...

Also - do you suggest to set the pageSize to 4096 even on M1 Mac?

kriszyp commented 1 year ago

I will update the docs, thank you for pointing that out. The default is (as of 2.5, I think) the OS page size. Historically, 99.9% of the time that is 4096 bytes, but MacOS with the M-series bucked that tradition and moved to 16KB page size. Note that this was back-compat change as the page size is stored in the DB.

There can be advantages of both larger page sizes (tends to do better with range queries), and smaller page sizes (more efficient use of memory for caching). But, I would generally recommend leaving the page size as the default since it means that the database pages will correspond with the size of the OS pages (there are some inefficiencies introduced when they differ). However, if you find better results with a different page size (on MacOS), I'd certainly be curious to hear about it.

ppedziwiatr commented 1 year ago

One last question - does the page size affect the storage size (https://github.com/kriszyp/lmdb-js/issues/194)? E.g. - is larger page size less effective in terms of storage size?

kriszyp commented 1 year ago

It does affect storage size, but it can also go either way. Ideally, larger page sizes are actually more efficient for storage. Imagine if you were storing many 3Kb records: with 4Kb pages, each record needs its own page, so 25% of storage is wasted. But with 16Kb page size, LMDB can fit 5 3Kb records on each page, with only 6% wasted storage.

However, that is assuming a typical long-duration application usage where the database is gradually grown through many small transactions where free space can readily be reclaimed. There are certainly pathological situations like in #194 where building a database with a few large transactions (or very long lived read transactions that prevent free space reclamation) can result in significant database growth, and larger page sizes would exacerbate this problem.

But generally I am inclined to think that Apple's decision to switch 16Kb pages was probably smart, and for what I would consider "typical" database usage, is probably advantageous for LMDB. But YMMV. If you do comparisons, let me know how it turns out (although I know such comparisons are time-consuming).

ppedziwiatr commented 1 year ago

My "benchmarks" (two separate caches - contract and state):

  1. 16384 (default for my M1)

    785M    cache/warp/lmdb/contract
    320M    cache/warp/lmdb/state
  2. 4096

    528M    cache/warp/lmdb/contract
    128M    cache/warp/lmdb/state