kriszyp / lmdb-js

Simple, efficient, ultra-fast, scalable data store wrapper for LMDB
Other
484 stars 39 forks source link

Long JSON array not stored with dupSort set to true #93

Closed aolo2 closed 2 years ago

aolo2 commented 2 years ago

I was implementing an online whiteboard, and as a fast hacky solution stored curve points as a json array in lmdb. But for some reason some of the curves were not getting saved. It turned out lmdb itself did not save them, although store.put resolved to true.

Here's a complete repro:

import { open } from 'lmdb-store';

const store = open({ path: 'repro.lmdb', dupSort: true });

async function main() {
    const ok1 = await store.put('key', [{'x': 0, 'y': 0}]);
    console.log('Short put ok =', ok1);

    const long = [];
    for (let i = 0; i < 1000; ++i) { // if you change this to 100, the put will actually work
        long.push({'x': i, 'y': i});
    }

    const ok2 = await store.put('key', long);
    console.log('Long put ok =', ok2);

    for (const value of store.getValues('key')) {
        console.log(value.length); // this will only print out 1
    }
}

main();

More details: I replaced the call to put with putSync, and used a binary buffer instead of an array of objects to measure the maximum size more precisely:

const long = new ArrayBuffer(1975); // this works
//  const long = new ArrayBuffer(1976); // this DOESN'T (?!)
store.putSync('key', long);

But with putSync I at least get an error:

Error: In database repro: MDB_BAD_VALSIZE: Unsupported size of key/DB name/data, or wrong DUPFIXED size
    at LMDBStore.putSync (/my/project/directory/node_modules/lmdb-store/index.js:619:15)

The 1976 seems suspiciously close to the 1978 limit for the key. But I'm hitting the limit for the value, not the key.

Also, this does not happen with dupSort: false.

kriszyp commented 2 years ago

Yes, this is actually specifically a maximum size of LMDB data values for dupSort, from the docs http://www.lmdb.tech/doc/group__mdb.html#structMDB__val:

Key sizes must be between 1 and mdb_env_get_maxkeysize() inclusive. The same applies to data sizes in databases with the MDB_DUPSORT flag. Other data items can in theory be from 0 to 0xffffffff bytes long.

However, not properly enforcing that and throwing a legible error message is definitely lmdb-store's fault (I can debug your code and see the internal error code taking place). I will add code to properly check the value size for dupSort databases. (Also, the reason that there is a slight difference in limit size is that with default msgpack encoding, your 1976 buffers gets wrapped a msgpack header with buffer contents, with three extra bytes).

aolo2 commented 2 years ago

Thank you for the swift response! I 100% agree that store.put should not return true in this case. Regarding my actual use case, I will just do something more sane that storing JSON arrays with dupSort.

kriszyp commented 2 years ago

FWIW, usually dupSort databases are used to store indices with foreign keys as values, and so typically have the same type of content as keys (and encoding: 'ordered-binary' works well with that), but I'm sure there are other use cases, that you are exploring, perhaps.