kriszyp / lmdb-js

Simple, efficient, ultra-fast, scalable data store wrapper for LMDB
Other
482 stars 39 forks source link

Issue with overlappingsync and sharedStructureKey #166

Closed kylebernhardy closed 2 years ago

kylebernhardy commented 2 years ago

LMDB.js: 2.3.9 Ubuntu 20.04

Sorry for another odd issue. This issue is in regards to having overlappingSync = true and a dbi with sharedStructureKey = Symbol.for('structures'). Writing data and reading it while the environment is open works fine. The issue occurs when you attempt to view the contents of the dbi after close. This transaction pattern is pretty old from the lmdb-store days, is it preferable to use the newer transaction function?

Sample load code:


const fs = require('fs');
const path = require('path');
const lmdb = require('lmdb');

function createEnv(name){
    let environment_path = path.join(__dirname, 'data', name);
    try {
        fs.rmSync(environment_path, {recursive:true})
        fs.mkdirSync(environment_path);
    }catch(e){}

    let env_init = {
        "path": environment_path,
        "mapSize": 1024*1024,
        "maxReaders": 100,
        sharedStructureKey: Symbol.for('structures'),
        overlappingSync: true
    };
    let env = lmdb.open(env_init);
    return env;
}

function createSchema(){
    let env = createEnv('test');

    let id_dbi = env.openDB('id', {
        create: true,
        dupSort: false,
        useVersions: true,
        sharedStructuresKey: Symbol.for('structures')});

    let name_dbi = env.openDB('name', {
        create: true,
        dupSort: true,
        useVersions: false,
        sharedStructuresKey: Symbol.for('structures')});

    let timestamp_dbi = env.openDB('timestamp', {
        create: true,
        dupSort: true,
        useVersions: false,
        sharedStructuresKey: Symbol.for('structures')});

    env.dbis = {
        id: id_dbi,
        name: name_dbi,
        timestamp: timestamp_dbi
    }

    return env;
}

async function loadData(){
    let env = createSchema();

    let promise;
    let objs = [];
    for(let x = 0, length = 1000; x<length; x++){
        objs.push({id:x, name: `test${x}`, timestamp: Date.now()});
    }

    console.time('write');
    let promises = [];
    let attrs = Object.keys(env.dbis);
    for(let x = 0, length = objs.length; x<length; x++){
        let obj = objs[x];
        promise = env.dbis.id.ifNoExists(x, ()=>{
            for(let y = 0, length = attrs.length; y < length; y++){
                let attr = attrs[y];
                if(attr !== 'id') {
                    env.dbis[attr].put(obj[attr], obj.id);
                }
            }
            env.dbis.id.put(obj.id, obj);
        });
        promises.push(promise);
    }

    let results = await Promise.all(promises);
    console.log(results);
}

loadData().then(r=>{
    console.log(r);
});

attempting to read data from dbi id:

const lmdb = require('lmdb');

const INDEX_NAME = 'id';

console.time('t');
let env_init = {
    "path": './data/test/',
    "mapSize": 1024 * 1024 * 1024,
    "maxReaders": 1000,
    "maxDbs":10000,
    sharedStructuresKey: Symbol.for('structures')
};

let env = lmdb.open(env_init);
let dbi = env.openDB(INDEX_NAME, {
    create: false,
    dupSort: false,
    useVersions: true,
    sharedStructuresKey: Symbol.for('structures')
});

for (let { key, value} of dbi.getRange({})) {
    console.log(key, value);
}

The results of this are only the structure: Symbol(structures) [ [ 'id', 'name', 'timestamp' ] ]

If you either set overlappingsync to false or remove the sharedStructuresKey from the dbi the range query returns the expected entries.

pieh commented 2 years ago

I think I see similar behaviour on my side as well when trying to update from 2.2.6 to 2.3.x (checked 2.3.0, 2.3.8 and newest 2.3.9).

We don't use sharedStructuresKey however (we commented it out at some point for some reason).

Possibly interesting bits:

I didn't manage to create small reproduction for this however - this might be matter of volume or maybe some specific timing on our side? (or just something that I missed ...)

If you either set overlappingsync to false or remove the sharedStructuresKey from the dbi the range query returns the expected entries.

~Does setting overlappingSync to false in options actually do anything? I tried that when I found this ticket and for me it didn't seem to have effect and when checking code ( https://www.runpkg.com/?lmdb@2.3.9/dist/index.cjs#2019 ) this option seems like is now calculated from other options and not set directly?~

~I did "monkey-patch" above linked line on my side and that was "fixing" the problem, but it appears I can't achieve this result with regular options now?~

EDIT: overlappingSync: false does work, I missed the Object.assign in the lmdb code and put overlappingSync: false, in wrong place in my tests :)

kriszyp commented 2 years ago

Thank you for the detailed report/test-case. This should be fixed in v2.3.10. Let me know if you see any lingering issue.

kylebernhardy commented 2 years ago

@kriszyp all of our tests pass now with this new version. This also fixed issues we were seeing with overlappingsync running on AWS EBS volumes that I was struggling to find a way to report!

I will wait until @pieh confirms this fixes his issues before closing.

pieh commented 2 years ago

From very brief testing, it appears like it's fixed on my side as well - I do think we can close this issue.

As always, thanks Kris for super quick turnaround on this :)

kylebernhardy commented 2 years ago

Kris as always you are amazing, thanks so much!

kriszyp commented 2 years ago

Thank you, appreciate the kind words!

I forgot I did want to respond this:

This transaction pattern is pretty old from the lmdb-store days, is it preferable to use the newer transaction function?

No, your transaction pattern is definitely right and idiomatic. Using ifNoExists + puts is a great, recommended, best-practice approach.

(BTW, you also don't need to await all the promises, you can just wait on the last one (since they resolve sequentially), or db.committed, unless you are collecting the results, but I know this is just a test case.)

kylebernhardy commented 2 years ago

Thanks for the response @kriszyp & good to know the pattern is still preferred. Less code refactor for us as the sample above is a simple example of our implementation. And thanks for the reminder on the promises!