kriszyp / lmdb-js

Simple, efficient, ultra-fast, scalable data store wrapper for LMDB
Other
482 stars 39 forks source link

DupSort table inserting error #133

Closed gogoout closed 2 years ago

gogoout commented 2 years ago

Minimal test to reproduce

const lmdb = require("lmdb");
const fs = require('fs');

async function test(id) {
    const dbUrl = "/tmp/db-test" + id;
    fs.rmSync(dbUrl, {recursive: true});

    const db = lmdb.open({
        path           : dbUrl,
        useVersions    : false,
        encoding       : "ordered-binary",
        dupSort        : true,
    });

    await Promise.all([
        db.put("k1", "v1"),
        db.put("k1", "v2"),
        db.put("k1", "v3"),
        db.put("k1", "v5")
    ]);

    await Promise.all([
        db.put("k2", "v4"),
        db.put("k2", "v6")
    ])
    //  await Promise.all(commits.map(each => each.flushed));

    console.log("k1", db.getValues("k1").asArray.join(","));
    console.log("k2", db.getValues("k2").asArray.join(","));

}

function main() {
    test(0);
}

main();

Using latest version, the result is

k1 v5,v6
k2 v6

Where it should be

k1 v1,v2,v3,v5
k2 v4,v6

I've also tested, seems this bug is introduced since v2.1.3. Originally I think this is caused by concurrent put, but seems it affects everything. Even you wrap the put in the transaction the behaviour is the same. Also change encoding from ordered-binary to msgpack still produce the same wrong result.

kriszyp commented 2 years ago

I believe the problem is that I don't think you can use dupSort on the root database (the unnamed database, which has entries to point to the named databases). Adding a name to the database options seems to make this work. However, I am not sure why I would have worked in earlier versions. I will do some more testing tomorrow to see if I can reproduce it on earlier versions and maybe bisect the commit. If that is indeed the problem, I will add some checks to enforce this.

gogoout commented 2 years ago

emm, interesting, that'll be an easy workaround. Just adding name to the lmdb.open should just turn it into a named database?

const db = lmdb.open({
    name           : "secondary",
    path           : "/tmp/db-test" + id,
    useVersions    : false,
    encoding       : "ordered-binary",
    dupSort        : true,
    cache          : false
  });
kriszyp commented 2 years ago

That's correct.

kriszyp commented 2 years ago

tl;dr: I don't think LMDB fully supports the dupSort mode for the main/unnamed database, but hard to tell if it is intended to be supported in certain situations.

Trying to do this yields a variety of interesting results:

So I think there a couple options:

gogoout commented 2 years ago

I think just make that clear should be good enough. The problem is it's not clear dupSort can't be used in main (null name) database. Basically, you have to go to lmdb's website to discover that behaviour. Overall, I think just add a several line on the doc and maybe do a check when open the database should be good enough.

kriszyp commented 2 years ago

I believe this has been addressed with improved documentation and better error handling in v2.2.