kriszyp / lmdb-js

Simple, efficient, ultra-fast, scalable data store wrapper for LMDB
Other
479 stars 39 forks source link

getRange/getKeys return internal key representation instead of original key #266

Closed alexghr closed 6 months ago

alexghr commented 7 months ago

I'm using a database with key encoding set to ordered-binary where I'm storing data against keys which are Buffers (this should be supported right?).

I need to perform getRange queries on this data and I need both the keys and values. The keys I get back from the iterator are using their internal representation though, instead of the original key.

The actual keys I'm using in my code are arrays that look like ["some prefix", <buffer>] but I've simplified the code to just individual Buffers here:

// simplified example - sample function to map a number to 4 bytes 
function numToBuff(x) {
  const buf = new ArrayBuffer(4);
  const view = new DataView(buf);
  view.setUint32(0, x, false); // big endian
  return new Uint8Array(buf);
}

const rootDB = lmdb.open({}); // a temp database
const testDB = rootDB.openDB("test", { keyEncoding: 'ordered-binary' });

await Promise.all([
  test.put(numToBuff(4), 'value for key=4');
  test.put(numToBuff(42), 'value for key=42');
  test.put(numToBuff(1000), 'value for key=1000');
  test.put(numToBuff(0), 'value for key=0');
]);

console.log(test.getRange().asArray)

The output:

[
  { key: [ null, null ], value: 'value for key=0' },
  { key: [ null, null, <Buffer 04> ], value: 'value for key=4' },
  { key: [ null, null, '*' ], value: 'value for key=42' },
  { key: [ null, <Buffer 03 e8>, '耀' ], value: 'value for key=1000' }
]

The values are returned in order, but the keys are completely off (and the element counts aren't consistent either). I've tried the mapping functions exported from the lmdb package but couldn't get the original key back

> numToBuff(1000) // my original key
Uint8Array(4) [ 0, 0, 3, 232 ]
> lmdb.keyValueToBuffer(numToBuff(1000)) // I guess this is what the db stores?
<Buffer 00 00 03 e8>
> lmdb.bufferToKeyValue(lmdb.keyValueToBuffer(numToBuff(1000))) // this is what I get back in getRange
[ null, <Buffer 03 e8>, '耀' ]

Is there any way I can get the original key as part of the iteration or should I store the key with the object?

I've got installed lmdb v2.9.1 and ordered-binary v1.4.1

kriszyp commented 7 months ago

keys which are Buffers (this should be supported right?)

No, Buffers are not supported for round-trip type-preserving with ordered-binary. lmdb-js allows you to provide buffers as keys, but it assumes these buffers are primitives values that already encoded with the ordered-binary format (this is primarily provided and useful if you want to encode a key one time, and use it many times). For example, these result in the same entry:

test.put(42, 'value for 42');
// and
const buf = lmdb.keyValueToBuffer(42);
test.put(buf, 'value for 42');

The first example is much faster for one time use (it avoids buffer creation), whereas the second one would be faster if you were encoding 42 to a buffer once and using that same encoding of 42 many times (more than 10x).

However, if you want to always provide a buffer and get a buffer back (from getRange), you should use keyEncoding: 'binary'. Also, worth noting that the keyEncoding: 'uint32' is also available for compact/efficient encoding of 32-bit integers (would be faster than using numToBuf for each put since it avoids buffer creation).

alexghr commented 7 months ago

it assumes these buffers are primitives values that already encoded with the ordered-binary format

Oh, that's not how I read the documentation at all. The example in the readme makes it sound like buffers are just supported alongside any primitive JS value. Thanks for clarification, I'll update my code

alexghr commented 6 months ago

I'll close this issue as answered.