kriszyp / lmdb-js

Simple, efficient, ultra-fast, scalable data store wrapper for LMDB
Other
520 stars 44 forks source link

Lightning Stream support #267

Closed tbaumann closed 9 months ago

tbaumann commented 12 months ago

I want to write a simple application with native Lightning Stream support. It basically needs additional content headers https://github.com/PowerDNS/lightningstream/blob/main/docs/schema-native.md And values that are deleted should not be deleted but have the deleted flag set (and the value set to empty).

The header seems pretty easy via custom encoders. But I suppose the devil is in the details. Like having access to the transaction ID at that time. And filtering deleted values and overwriting remove() would be required...

I haven't produced any code yet but I wonder if it makes sense to support in this library or a wrapper library?

I man implementing it in application level would be cool too. The particular application can very well be polluted with back-end specifics.

kriszyp commented 11 months ago

I would suggest that most of this probably does belong at the application level. We also build this same type of functionality on top of lmdb-js, so it definitely works well to do that. At some point it might be nice to have "deleted" entry awareness in lmdb-js for record counts, but we have the same exact concept in the db software we build. I will also mention the txnId is available in the aftercommit listener event.

tbaumann commented 11 months ago

Thanks for the hints. That will help me greatly.

We also build this same type of functionality on top of lmdb-js, so it definitely works well to do that.

Is that open source by any chance? I would love to pillage the sources a bit. :D

tbaumann commented 10 months ago

I finally got time to play with this. The message packing and unpacking seems relatively straight forward.

But the event listeners don't seem to work. (Not sure if I should open a new ticket)

const lmdb = require('lmdb');
const db = lmdb.open({ path: 'test_db' });
let token_db = db.openDB('tokens');

db.on('aftercommit', ({ next, last, txnId }) => {
  console.log("aftercommit", txnId)
});
db.on("beforecommit", (...args) => {
  const parameters = args.join(', ');
  console.log(`beforecommit event with parameters ${parameters}`);
});
token_db.on('aftercommit', ({ next, last, txnId }) => {
  console.log("aftercommit", txnId)
});
token_db.on("beforecommit", (...args) => {
  const parameters = args.join(', ');
  console.log(`beforecommit event with parameters ${parameters}`);
});
token_db.putSync("test","test")
token_db.get("test");
console.log("events: ", db.eventNames());
console.log("events: ", token_db.eventNames());

outputs

events:  []
events:  []

Also, I need the txnId before I write out data. With aftercommit I see only a way to know the txnId after the first commit.

Is there a way to access the highest txnId before a commit is made?

kriszyp commented 10 months ago

The commit events are only for asynchronous transactions, not synchronous as in the example above. I suppose they could be triggered for synchronous transactions, but doesn't seem that helpful since they are already explicitly started/stopped in the main thread. Instead I added a getWriteTxnId method (in the referenced commit) that can be used to get the transaction id for explicit transaction callbacks.

tbaumann commented 10 months ago

Cool, that's very useful. Thanks a lot.

I was thinking of wrapping my code into transactions even though it's not really useful for me. But then realised that I have no way of accessing the txnID of the ongoing transaction so there was no benefit to that anyway. Really cool to be able to do it like this now

tbaumann commented 10 months ago

I struggle to use the encoding Module to implement this, mostly because I have to re-use existing headers when updating entries. (I'm not allowed to drop unused flags and header extensions. But that context would be lost.)

Is there something like putBinary() complementary to getBinary()? Because passing a Buffer into put() put's a prefix before the data.

I can of course use a null Encoder, but putBinary() would be a bit more clear in the code.

kriszyp commented 10 months ago

Yes, I think you want to use db.put(key, asBinary(alreadyEncodedBuffer)) (asBinary is an export of lmdb).

tbaumann commented 10 months ago

Yes, I think you want to use db.put(key, asBinary(alreadyEncodedBuffer)) (asBinary is an export of lmdb).

Oh how embarrassing. I was convinced asBinary() put a two byte header in front of my data. I did a lot of testing with zero buffers and got spurious data in the front. But today with coffee and more sleep I see it's working as intended. :facepalm:

Sorry to waste your time

tbaumann commented 10 months ago

For anyone coming here by google


// Header offsets
// https://github.com/PowerDNS/lightningstream/blob/main/docs/schema-native.md
const LS_HEADER_SIZE = 24;
const LS_HEADER_POS_TIMESTAMP = 0;
const LS_HEADER_POS_TXN = 8;
const LS_HEADER_POS_SCHEMA = 16;
const LS_HEADER_POS_FLAGS = 17;
const LS_HEADER_POS_EXTENSION_COUNT = 22;
const LS_EXTENSION_HEADER_SIZE = 8;
const LS_FLAGS_DELETED = 1 << 0;

class LSData {
  #header;
  #value;
  constructor(data) {
    if (data) {
      // Parse a buffer
      this.unpack(data);
    } else {
      // Init an empty object
      this.#header = Buffer.alloc(LS_HEADER_SIZE);
    }
    return this;
  }
  unpack(data) {
    let extension_headers = data.readInt8(LS_HEADER_POS_EXTENSION_COUNT);
    let header_len =
      LS_HEADER_SIZE + extension_headers * LS_EXTENSION_HEADER_SIZE;
    let header = data.subarray(0, header_len);
    let buf = data.subarray(header.length);

    if(header.readInt8(LS_HEADER_POS_SCHEMA) != 0){
      throw Error("Schema version of Lightning header is not 0. Not allowed!")
    }

    this.#header = header;
    if (buf.length > 0) {
      this.value = msgpackr.unpack(buf);
    } else {
      this.value = undefined;
    }
    return this;
  }
  asBuffer(txnID) {
    this.#header.writeBigInt64BE(BigInt(txnID), LS_HEADER_POS_TXN);
    this.#header.writeBigInt64BE(process.hrtime.bigint(), LS_HEADER_POS_TIMESTAMP);
    let buf = [this.#header];
    if (!this.deleted) {
      buf.push(msgpackr.pack(this.#value)); // Only write data if it was set
    }
    return Buffer.concat(buf);
  }
  get deleted() {
    let flags = this.#header.readInt8(LS_HEADER_POS_FLAGS);
    let value = (flags & LS_FLAGS_DELETED) != 0;
    return value;
  }
  set deleted(val) {
    let delflag = val ? LS_FLAGS_DELETED : 0;

    let flags = this.#header.readInt8(LS_HEADER_POS_FLAGS);
    flags = (flags & LS_FLAGS_DELETED) | delflag;
    this.#header.writeInt8(flags, LS_HEADER_POS_FLAGS);
  }
  set value(val){
    this.deleted = false;
    this.#value = val
  }
  get value(){
    return this.#value;
  }
}
// Store
    return await db.transactionSync(() => {
      let data;
      if (db.doesExist(key)) {
        data = db.getBinary(key);
      }
      let ls_entry = new LSData(data);
      ls_entry.deleted = false;
      ls_entry.value = {YOUR STUFF};
      return token_db.put(key, asBinary(ls_entry.asBuffer(db.getWriteTxnId())));
    });

// Retrieve
    let data = db.getBinary(key);
    if (!data) {
      return null; // No entry
    }
    let ls_entry = new LSData(data);
    if (ls_entry.deleted) {
      return null; // Entry is marked as deleted
    }
    let value = ls_entry.value;

// Delete
    return await db.transactionSync(() => {
      let data;
      if (db.doesExist(key)) {
        data = db.getBinary(key);
      }
      let ls_entry = new LSData(data);
      ls_entry.deleted = true;
      return db.put(key, asBinary(ls_entry.asBuffer( db.getWriteTxnId())));
    });
tbaumann commented 9 months ago

If you don't mind I pop this one open again. :smile:

So I have a db that I can read and write, and I think it's lightningstream compatible data.

But lightningstream can't even open my db.

time="2024-02-14T11:22:35Z" level=info msg="PID satisfies minimum" minimum_pid=50 pid=59
time="2024-02-14T11:22:38Z" level=info msg="Storage backend initialised" storage_type=s3
time="2024-02-14T11:22:38Z" level=info msg="[main          ] Opening LMDB" db=main lmdbpath=/lmdb/instance-1/db
time="2024-02-14T11:22:38Z" level=info msg="[main          ] Env info" LastTxnID=0 MapSize="1024.0 MB" db=main
time="2024-02-14T11:22:38Z" level=info msg="registered tracker for failure duration" healthtracker=main_storage_store
time="2024-02-14T11:22:38Z" level=info msg="registered tracker for startup phase" starttracker=main
time="2024-02-14T11:22:38Z" level=info msg="[main          ] schema_tracks_changes enabled" db=main instance=instance-1
time="2024-02-14T11:22:38Z" level=info msg="[main          ] Initialised syncer" db=main instance=instance-1
time="2024-02-14T11:22:38Z" level=info msg="[shard         ] Opening LMDB" db=shard lmdbpath=/lmdb/instance-1/db-0
time="2024-02-14T11:22:38Z" level=info msg="[main          ] Enabled LMDB stats logging" db=main instance=instance-1 interval=30m0s
time="2024-02-14T11:22:38Z" level=info msg="registered tracker for failure duration" healthtracker=main_storage_list
time="2024-02-14T11:22:38Z" level=info msg="registered tracker for failure duration" healthtracker=main_storage_load
time="2024-02-14T11:22:38Z" level=info msg="[shard         ] Env info" LastTxnID=0 MapSize="1024.0 MB" db=shard
time="2024-02-14T11:22:38Z" level=info msg="registered tracker for failure duration" healthtracker=shard_storage_store
time="2024-02-14T11:22:38Z" level=info msg="registered tracker for startup phase" starttracker=shard
time="2024-02-14T11:22:38Z" level=info msg="[shard         ] schema_tracks_changes enabled" db=shard instance=instance-1
time="2024-02-14T11:22:38Z" level=info msg="[shard         ] Initialised syncer" db=shard instance=instance-1
time="2024-02-14T11:22:38Z" level=info msg="[authdb        ] Opening LMDB" db=authdb lmdbpath=/lmdb/instance-1/authdb
time="2024-02-14T11:22:38Z" level=info msg="[shard         ] Enabled LMDB stats logging" db=shard instance=instance-1 interval=30m0s
time="2024-02-14T11:22:38Z" level=info msg="registered tracker for failure duration" healthtracker=shard_storage_list
time="2024-02-14T11:22:38Z" level=info msg="registered tracker for failure duration" healthtracker=shard_storage_load
time="2024-02-14T11:22:38Z" level=fatal msg=Error error="lmdb env: open: mdb_env_open: MDB_INVALID: File is not an LMDB file"
time="2024-02-14T11:22:38Z" level=warning msg="Exiting with exit code" exitcode=1 pid=1

The first db's are the one from pdns. authdb is mine.

It's a big mess.

The lmdb file created from lmdb-js on the node:current-slim creates a file that supposedly isn't even a db.

[nix-shell:~/git/lightningstream]$ sudo mdb_stat -a -n -e -f -r /home/tilli/.local/share/containers/storage/volumes/lightningstream_lmdb/_data/instance-1/authdb
mdb_env_open failed, error -30793 MDB_INVALID: File is not an LMDB file

[nix-shell:~/git/lightningstream]$ sudo mdb_stat -a -n -e -f -r /home/tilli/.local/share/containers/storage/volumes/lightningstream_lmdb/_data/instance-1/db
Environment Info
  Map address: (nil)
  Map size: 1073741824
  Page size: 4096
  Max pages: 262144
  Number of pages used: 2
  Last transaction ID: 0
  Max readers: 126
  Number of readers used: 0
Reader Table Status
(no active readers)
Freelist Status
  Tree depth: 0
  Branch pages: 0
  Leaf pages: 0
  Overflow pages: 0
  Entries: 0
  Free pages: 0
Status of Main DB
  Tree depth: 0
  Branch pages: 0
  Leaf pages: 0
  Overflow pages: 0
  Entries: 0

The node instance on my dev system is subtly different.

$ mdb_stat -n lmdb_ls_native_db
mdb_env_open failed, error -30794 MDB_VERSION_MISMATCH: Database environment version mismatch
$ mdb_stat -V
LMDB 0.9.31: (July 10, 2023)

I feel suddenly transported in the 90's. Does LMDB really have incompatible binary versions?

PS: Apparently lithiningstream is unhappy if you don't use noSubdir: true

tbaumann commented 9 months ago

I tried with --use_data_v1=true (Based on node:buster base image so all the tools are there)

 npm install lmdb --build-from-source --use_data_v1=true 
npm notice 
npm notice New minor version of npm available! 10.1.0 -> 10.4.0
npm notice Changelog: <https://github.com/npm/cli/releases/tag/v10.4.0>
npm notice Run `npm install -g npm@10.4.0` to update!
npm notice 
npm ERR! code 1
npm ERR! path /user/src/myapp/node_modules/lmdb
npm ERR! command failed
npm ERR! command sh -c node-gyp-build-optional-packages
npm ERR! make: Entering directory '/user/src/myapp/node_modules/lmdb/build'
npm ERR!   CXX(target) Release/obj.target/lmdb/src/lmdb-js.o
npm ERR!   CC(target) Release/obj.target/lmdb/dependencies/lmdb/libraries/liblmdb/midl.o
npm ERR!   CC(target) Release/obj.target/lmdb/dependencies/lmdb/libraries/liblmdb/chacha8.o
npm ERR!   CC(target) Release/obj.target/lmdb/dependencies/lz4/lib/lz4.o
npm ERR!   CXX(target) Release/obj.target/lmdb/src/writer.o
npm ERR! make: Leaving directory '/user/src/myapp/node_modules/lmdb/build'
npm ERR! gyp info it worked if it ends with ok
npm ERR! gyp info using node-gyp@9.4.0
npm ERR! gyp info using node@20.8.1 | linux | x64
npm ERR! gyp info find Python using Python version 3.7.3 found at "/usr/bin/python3"
npm ERR! gyp http GET https://nodejs.org/download/release/v20.8.1/node-v20.8.1-headers.tar.gz
npm ERR! gyp http 200 https://nodejs.org/download/release/v20.8.1/node-v20.8.1-headers.tar.gz
npm ERR! gyp http GET https://nodejs.org/download/release/v20.8.1/SHASUMS256.txt
npm ERR! gyp http 200 https://nodejs.org/download/release/v20.8.1/SHASUMS256.txt
npm ERR! gyp info spawn /usr/bin/python3
npm ERR! gyp info spawn args [
npm ERR! gyp info spawn args   '/usr/local/lib/node_modules/npm/node_modules/node-gyp/gyp/gyp_main.py',
npm ERR! gyp info spawn args   'binding.gyp',
npm ERR! gyp info spawn args   '-f',
npm ERR! gyp info spawn args   'make',
npm ERR! gyp info spawn args   '-I',
npm ERR! gyp info spawn args   '/user/src/myapp/node_modules/lmdb/build/config.gypi',
npm ERR! gyp info spawn args   '-I',
npm ERR! gyp info spawn args   '/usr/local/lib/node_modules/npm/node_modules/node-gyp/addon.gypi',
npm ERR! gyp info spawn args   '-I',
npm ERR! gyp info spawn args   '/root/.cache/node-gyp/20.8.1/include/node/common.gypi',
npm ERR! gyp info spawn args   '-Dlibrary=shared_library',
npm ERR! gyp info spawn args   '-Dvisibility=default',
npm ERR! gyp info spawn args   '-Dnode_root_dir=/root/.cache/node-gyp/20.8.1',
npm ERR! gyp info spawn args   '-Dnode_gyp_dir=/usr/local/lib/node_modules/npm/node_modules/node-gyp',
npm ERR! gyp info spawn args   '-Dnode_lib_file=/root/.cache/node-gyp/20.8.1/<(target_arch)/node.lib',
npm ERR! gyp info spawn args   '-Dmodule_root_dir=/user/src/myapp/node_modules/lmdb',
npm ERR! gyp info spawn args   '-Dnode_engine=v8',
npm ERR! gyp info spawn args   '--depth=.',
npm ERR! gyp info spawn args   '--no-parallel',
npm ERR! gyp info spawn args   '--generator-output',
npm ERR! gyp info spawn args   'build',
npm ERR! gyp info spawn args   '-Goutput_dir=.'
npm ERR! gyp info spawn args ]
npm ERR! gyp info spawn make
npm ERR! gyp info spawn args [ 'BUILDTYPE=Release', '-C', 'build' ]
npm ERR! ../src/writer.cpp: In member function 'int WriteWorker::WaitForCallbacks(MDB_txn**, bool, uint32_t*)':
npm ERR! ../src/writer.cpp:126:17: error: 'MDB_TRACK_METRICS' was not declared in this scope
npm ERR!   if (envFlags & MDB_TRACK_METRICS)
npm ERR!                  ^~~~~~~~~~~~~~~~~
npm ERR! ../src/writer.cpp:135:20: error: 'MDB_TRACK_METRICS' was not declared in this scope
npm ERR!      if (envFlags & MDB_TRACK_METRICS)
npm ERR!                     ^~~~~~~~~~~~~~~~~
npm ERR! ../src/writer.cpp:144:17: error: 'MDB_TRACK_METRICS' was not declared in this scope
npm ERR!   if (envFlags & MDB_TRACK_METRICS)
npm ERR!                  ^~~~~~~~~~~~~~~~~
npm ERR! ../src/writer.cpp: In static member function 'static int WriteWorker::DoWrites(MDB_txn*, EnvWrap*, uint32_t*, WriteWorker*)':
npm ERR! ../src/writer.cpp:359:19: warning: deleting 'void*' is undefined [-Wdelete-incomplete]
npm ERR!       delete value.mv_data;
npm ERR!                    ^~~~~~~
npm ERR! ../src/writer.cpp:367:19: warning: deleting 'void*' is undefined [-Wdelete-incomplete]
npm ERR!       delete value.mv_data;
npm ERR!                    ^~~~~~~
npm ERR! ../src/writer.cpp: In member function 'void WriteWorker::Write()':
npm ERR! ../src/writer.cpp:449:6: warning: unused variable 'retries' [-Wunused-variable]
npm ERR!   int retries = 0;
npm ERR!       ^~~~~~~
npm ERR! ../src/writer.cpp:450:2: warning: label 'retry' defined but not used [-Wunused-label]
npm ERR!   retry:
npm ERR!   ^~~~~
npm ERR! make: *** [lmdb.target.mk:159: Release/obj.target/lmdb/src/writer.o] Error 1
npm ERR! gyp ERR! build error 
npm ERR! gyp ERR! stack Error: `make` failed with exit code: 2
npm ERR! gyp ERR! stack     at ChildProcess.onExit (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/build.js:203:23)
npm ERR! gyp ERR! stack     at ChildProcess.emit (node:events:514:28)
npm ERR! gyp ERR! stack     at ChildProcess._handle.onexit (node:internal/child_process:294:12)
npm ERR! gyp ERR! System Linux 6.7.3
npm ERR! gyp ERR! command "/usr/local/bin/node" "/usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
npm ERR! gyp ERR! cwd /user/src/myapp/node_modules/lmdb
npm ERR! gyp ERR! node -v v20.8.1
npm ERR! gyp ERR! node-gyp -v v9.4.0
npm ERR! gyp ERR! not ok

npm ERR! A complete log of this run can be found in: /root/.npm/_logs/2024-02-14T16_55_23_605Z-debug-0.log

I guess there are some non-obvious build dependencies...

tbaumann commented 9 months ago

closing because it's not really related to the issue I think