kriszyp / lmdb-js

Simple, efficient, ultra-fast, scalable data store wrapper for LMDB
Other
505 stars 41 forks source link

Ask for help: mdb_page_spill error #283

Closed falcon027 closed 5 months ago

falcon027 commented 6 months ago

I have a problem that causes my application to crash. And I have no idea what it means or how to fix it. Does anyone have any idea how I can prevent this?

mdb_page_spill error
Error: Invalid argument: Attempting to setup locks
    at Function.<anonymous> (file:///Users/falcon/Projects/express-function/node_modules/.pnpm/lmdb@3.0.0/node_modules/lmdb/write.js:413:7) {
  code: 22
}
mdb_page_spill error
../dependencies/lmdb/libraries/liblmdb/mdb.c:4326: Assertion 'mop_len == 0' failed in mdb_freelist_save()
zsh: abort      NODE_ENV="production" node masterProcess.js

My config

let newObject = open({
        path: objectsPath,
        // any options go here, we can turn on compression like this:
        compression: false,
        dupSort: false, // do not allow duplicate kes <= this kills performance
        cache: false, // helps with performance with larger objects <= lags behind
        maxDbs: 1,
        commitDelay: 20, // in ms This is the amount of time to wait for batching write operations before committing the writes (in a transaction).
        noMemInit: true, //  If you do not need to worry about unauthorized access to the database files themselves, this is recommended.
        keyEncoding: "ordered-binary",
        remapChunks: false,
        pageSize: 4096,
        useWritemap: false,
        safeRestore: true,
        //encoding: "json",
        sharedStructuresKey: Symbol.for("structures"),

        // keyEncoding: "uint32",
        // keyEncoder: {
        //     writeKey: function (key, targetBuffer, startPosition) {
        //         console.log(key, targetBuffer, startPosition);
        //         return hash(key.toString())
        //     },
        //      readKey: function (sourceBuffer, start, end) {
        //          console.log(sourceBuffer, start, end);
        //      }
        //
        // }
    });

Thanks in advance Falcon.

kriszyp commented 6 months ago

Do the errors above occur in that order (the JS error and then the abort())? Does this occur on startup or after some usage? Does this occur with v2.9?

falcon027 commented 6 months ago

Thanks for getting back to me, it does happen randomly after using it for a while. I have not tested it with v.2.9 yet, I will try and get back to you.

kriszyp commented 6 months ago

I have also published a v3.0.1 with increased debugging information for this error, if you can get a chance to try that.

falcon027 commented 6 months ago

I just tested it with version 3.0.1. And I get the following error message:

Freelist record not found 1 5 1 1 1 68 0 0../dependencies/lmdb/libraries/liblmdb/mdb.c:4330: Assertion 'mop_len == 0' failed in mdb_freelist_save()
zsh: abort      NODE_ENV="production" node masterProcess.js
kriszyp commented 6 months ago

I put in an (attempted) fix for this in v3.0.2, let me know if it addresses the issue or if there is still an error what the message is (added more debugging).

falcon027 commented 6 months ago

Thanks for your quick debugging work. On version 3.0.2 I still get the following error:

Freelist record not found 1 5 1 1 1 8 2147483647 2147483647 0 0 0 0 0
../dependencies/lmdb/libraries/liblmdb/mdb.c:4341: Assertion 'mop_len == 0' failed in mdb_freelist_save()
zsh: abort      NODE_ENV="production" node masterProcess.js
kriszyp commented 6 months ago

Ok, I have changed that assertion, as it appears this is a valid state to me. Let me know if you run into this or a related issue with v3.0.3.

falcon027 commented 6 months ago

This exact error no longer occurs, but there are new ones, see below:

Write error: Invalid argument position 163840, size 4096mdb_page_spill flushing error 22
mdb_page_spill error 22

and

Error: Invalid argument: Attempting to write page at position 163840, size 4096
    at Function.<anonymous> (file:///Users/falcon/Projects/express-function/node_modules/.pnpm/lmdb@3.0.3/node_modules/lmdb/write.js:413:7) {
  code: 22
}
kriszyp commented 6 months ago

I have published a v3.0.3-debug.1 with more debugging information.

However, do you have any example/test case that can reproduce this? I have never been able to reproduce any of these errors, with any of our tests, including with your db config above.

falcon027 commented 6 months ago

I have run your newly created debugging version and I get the following error:

Error: Invalid argument: Attempting to write page at position 163840, size 4096, blocks 1, buffer sizes 4096 4096 4096
    at Function.<anonymous> (file:///Users/falcon/express-function/node_modules/.pnpm/lmdb@3.0.3-debug.1/node_modules/lmdb/write.js:413:7) {
  code: 22
}

I will try to narrow down the problem and send you a setup to reproduce the error.

falcon027 commented 6 months ago

I was able to pinpoint the problem and create a repo to reproduce the bug. Please let me know if you can reproduce the bug on your system or if it might be platform specific.

The database file seems to be corrupted in some way.

kriszyp commented 6 months ago

Yes, I can reproduce this with your database. And indeed there is a incorrect/corrupted free space entry that is causing the problem. Do you have any suggestions/reproduction for how to get the database into this state? I haven't been able to reproduce that. I could add some code to tolerate this condition, but certainly would prefer to find the cause.

falcon027 commented 6 months ago

I recorded and replayed the database inserts prior to the curpution, but was unable to reproduce the condition outside of my application. I will keep trying and get back to you with the results, if you have any ideas let me know.

falcon027 commented 6 months ago

I was able to replicate the bug in an isolated environment. The repository has been updated. If you run the runMultiple.js file and wait for approximately 2 minutes, you will encounter the error. You may need to restart the program a few times to reproduce the bug. Please let me know if the bug is reproducible on your system.

kriszyp commented 6 months ago

No, I haven't been able to reproduce this. I have run this for multiple hours on my linux and mac computers with various restarts and haven't had any luck in reproducing. Will keep trying though.

falcon027 commented 5 months ago

That's not good, maybe it's something specific to my setup, I have an M1 Mac. See hardware info below:

Software:

    System Software Overview:

      System Version: macOS 14.4.1 (23E224)
      Kernel Version: Darwin 23.4.0
      Boot Volume: Macintosh HD
      Boot Mode: Normal
      Secure Virtual Memory: Enabled
      System Integrity Protection: Enabled
      Time since boot: 18 minutes, 9 seconds

Hardware:

    Hardware Overview:

      Model Name: MacBook Pro
      Model Identifier: MacBookPro18,3
      Chip: Apple M1 Pro
      Total Number of Cores: 10 (8 performance and 2 efficiency)
      Memory: 32 GB
      System Firmware Version: 10151.101.3
      OS Loader Version: 10151.101.3
      Activation Lock Status: Enabled

I also put the logs of a runMultible.js execution into the repo maybe you can see something from the logs.

kriszyp commented 5 months ago

Thank you for information. Perhaps this is an ARM-specific issue, I will try an AWS instance and see if I can reproduce next.

kriszyp commented 5 months ago

So far, no luck with a graviton, after a few hours of testing. Will see if I can get a mac instance started.

falcon027 commented 5 months ago

OK, this is strange, when I run the runMultible.js file the error appears within a minute. I also tried it with version 2.9.4 and got the following error:

Node.js v20.10.0
[error] file:///Users/falcon/Projects/express-function/node_modules/msgpackr/unpack.js:497
                                        let error = new Error('Unexpected end of MessagePack data')

Error: Unexpected end of MessagePack data
    at read (file:///Users/falcon/Projects/express-function/node_modules/msgpackr/unpack.js:497:18)
    at Array.readObject [as read] (file:///Users/falcon/Projects/express-function/node_modules/msgpackr/unpack.js:522:18)
    at read (file:///Users/falcon/Projects/express-function/node_modules/msgpackr/unpack.js:260:23)
    at checkedRead (file:///Users/falcon/Projects/express-function/node_modules/msgpackr/unpack.js:197:13)
    at Packr.unpack (file:///Users/falcon/Projects/express-function/node_modules/msgpackr/unpack.js:104:12)
    at Packr.decode (file:///Users/falcon/Projects/express-function/node_modules/msgpackr/unpack.js:176:15)
    at Object.next (file:///Users/falcon/Projects/express-function/node_modules/lmdb/read.js:552:31)
    at Object.next (file:///Users/falcon/Projects/express-function/node_modules/lmdb/util/RangeIterable.js:32:35)
    at next (file:///Users/falcon/Projects/express-function/node_modules/lmdb/util/RangeIterable.js:261:24) {
  incomplete: true
}
kriszyp commented 5 months ago

Sorry this took a while, I realized I was being stupid and running the wrong script. Should be fixed in v3.0.6.

falcon027 commented 5 months ago

I tested it on my machine and it works perfectly. Thank you very much. ( I sent a small token of appreciation to your email.)