Large value interpreted as key being too large

kriszyp / lmdb-js

Simple, efficient, ultra-fast, scalable data store wrapper for LMDB

Other

479 stars 39 forks source link

Large value interpreted as key being too large #281

Closed pkaminski closed 2 months ago

pkaminski commented 3 months ago

When I set up the database with keyEncoding: 'binary' and encoding: 'ordered-binary', trying to write a large value against a small key fails with a "key was too large" error. Repro:

const lmdb = require('lmdb');
const db = lmdb.open({path: 'test_db', encoding: 'ordered-binary', keyEncoding: 'binary'});
db.putSync(Buffer.from('00000000', 'hex'), 'x'.repeat(2000))

Output:

Uncaught Error: Key of size 2000 was too large, max key size is 1978
    at saveKey (C:\Code\workspace\cinderbase\node_modules\lmdb\dist\index.cjs:1406:9)
    at store.encoder.encode (C:\Code\workspace\cinderbase\node_modules\lmdb\dist\index.cjs:1344:13)
    at writeInstructions (C:\Code\workspace\cinderbase\node_modules\lmdb\dist\index.cjs:232:28)
    at LMDBStore.put (C:\Code\workspace\cinderbase\node_modules\lmdb\dist\index.cjs:743:11)

pkaminski commented 2 months ago

Ping — just wondering if this can be fixed, or if I should change my database configuration? I'm trying to create a database with a compact representation of keys that are conceptually variable-length arrays of 32 bit uints, and with only primitive values (JS string, number, boolean, null). (The values don't need to be ordered but ordered-binary looked like the likely most efficient encoding for primitives.)

kriszyp commented 2 months ago

What is the reason for using ordered-binary instead of the default (msgpack) encoding if they aren't intended to be ordered? I think msgpack should generally provide efficient/fast encoding.

pkaminski commented 2 months ago

I didn't profile, but my assumption was that ordered-binary would be a more efficient encoder if the values were all primitives. Sounds like I'm wrong, though?

kriszyp commented 2 months ago

By "efficient" do you mean size or speed? I would say msgpack would usually be faster (ordering imposes extra constraints), but that really can vary. By themselves, strings may be one byte more compact with ordered-binary, but there is extra decoding cost to finding the delimiters. Numbers can be more compact in either representation based on the number, by decoding is a little more complicated with ordered-binary. I would think multi-valued arrays (or primitives) are generally going to be faster and more compact with msgpack (because it uses a length encoding rather than delimiter-based encoding).

kriszyp commented 2 months ago

I think it would be possible to extend the size limits of ordered-binary in lmdb, although there are actually some intentional optimizations that are done based on the assumption of limited size, since the intended purpose of ordered-binary as an encoding is to support references to keys, which also have the same size limits.

pkaminski commented 2 months ago

I expected both size and speed benefits, but sounds like I'll get neither — I'll switch back to msgpack. Thanks!

The size limitation on ordered-binary makes sense but it might be good to mention it in the docs for value encoding schemes.

pkaminski commented 2 months ago

I switched to the msgpack encoding. The only slight hitch was that I also needed to be able to serialize a special placeholder symbol, which msgpack doesn't support, but a trivial addExtension did the trick there.