Open falcon027 opened 2 weeks ago
Do you know if this error occurs in 3.0.11? (sounds like it may be difficult to reproduce?)
I have checked the logs and 3.0.11 also has it. Could this be related to low free space or memory?
I wouldn't expect this to be due to low free space or memory, but can't be certain. Certainly if you have any suggestions for how to reproduce, will gladly give it a try. I can also try add some more debugging information for this case.
This may be unrelated, but I got another error message tonight.
´´´
SyntaxError: Unexpected token '', "�"... is not valid JSON
File "
I have identified the source of the problem as a corrupt database entry. I have included the code to reproduce both errors in the Bug.js file. When performing a range read of the file, it returns some correct objects until it encounters an invalid one. Writing is not functional. Unfortunately, I am unaware of how the incorrect record entered the database. However, there should be a way to recover from such a corrupted state. Please find attached a link to the corrupted database file, called "relationGraph", as well as the JavaScript code to reproduce the errors.
https://drive.google.com/drive/folders/1hYe_GtScpX5SFI95qNbP8NRlfpMAXJ3u?usp=sharing
I am still looking for the reason for the broken record and have found a segfault that seems to occur under heavy load. I have attached a script to reproduce the error.
import {open} from "lmdb";
import SegfaultHandler from "segfault-handler";
import fs from "fs";
SegfaultHandler.registerHandler("crash.log", function (signal, address, stack) {
console.log(signal, address, stack);
});
fs.rmSync("./corruptionTest", {recursive: true, force: true});
async function main() {
let newObject = open({
path: "./corruptionTest2",
compression: false,
dupSort: false,
cache: false,
maxDbs: 1,
commitDelay: 50,
noMemInit: true,
keyEncoding: "ordered-binary",
remapChunks: false,
pageSize: 16384,
useWritemap: false,
safeRestore: true,
encoding: "json",
});
while (true) {
newObject.put("S:c1edfab0-ca28-4124-822b-bdc5a78cf527", ["bb5ed3951fc9"])
}
// output =>
// PID 63939 received SIGSEGV for address: 0x0
// 0 segfault-handler.node 0x0000000107ea2538 _ZL16segfault_handleriP9__siginfoPv + 252
// 1 libsystem_platform.dylib 0x000000019c317584 _sigtramp + 56
// 2 node.abi115.node 0x000000010865858c mdb_cursor_put + 4608
// 3 node.abi115.node 0x00000001086647e4 mdb_put + 364
// 4 node.abi115.node 0x0000000108612318 _ZN11WriteWorker8DoWritesEP7MDB_txnP7EnvWrapPjPS_ + 2352
// 5 node.abi115.node 0x0000000108612968 _ZN11WriteWorker5WriteEv + 344
// 6 node.abi115.node 0x00000001086127f4 _Z8do_writeP10napi_env__Pv + 36
// 7 node 0x00000001051958d8 _ZZN4node14ThreadPoolWork12ScheduleWorkEvENKUlP9uv_work_sE_clES2_ + 236
// 8 node 0x0000000105195638 _ZZN4node14ThreadPoolWork12ScheduleWorkEvENUlP9uv_work_sE_8__invokeES2_ + 24
// 9 libuv.1.dylib 0x0000000108277178 worker + 224
// 10 libsystem_pthread.dylib 0x000000019c2e6f94 _pthread_start + 136
// 11 libsystem_pthread.dylib 0x000000019c2e1d34 thread_start + 8
}
main()
When I attempt this test, it just runs out of memory because the loop never yields to the event turn for completing the resolution of all the promises of the puts that have been made. It is necessary to yield to the event turn to allow commit events to complete.
Hopefully in the next week or so I can take a look at possible mechanisms to tolerate a corrupted entry, but generally those are too deep in the structure to recover from.
Thanks for your input. I did some more debugging in my test environment and found that 14,000 operations were being performed in the second before the crashes occurred. This is what I was trying to simulate with the script in my old comment, but you are right that using a while loop without an await is probably not the right way to simulate a high load.
So I tried another approach to simulate the load as seen in the code below, I am actually not sure if it is correct this time, but I have nevertheless found that if I turn off eventTurnBatching the program does not crash.
I am not sure what to make of this, if you have any thoughts let me know.
import {open} from "lmdb";
import SegfaultHandler from "segfault-handler";
import fs from "fs";
SegfaultHandler.registerHandler("crash.log", function (signal, address, stack) {
console.log(signal, address, stack);
});
fs.rmSync("./corruptionTest", {recursive: true, force: true});
async function main() {
let newObject = open({
path: "./corruptionTest",
compression: false,
dupSort: false,
cache: false,
maxDbs: 1,
commitDelay: 0,
eventTurnBatching: true, // <=If set to true it will crash, if set to false it will not.
noMemInit: true,
keyEncoding: "ordered-binary",
remapChunks: false,
pageSize: 16384,
useWritemap: false,
safeRestore: true,
encoding: "json",
});
for (let i = 0; i < 15000; i++) {
setInterval(async () => {
await newObject.put("S:c1edfab0-ca28-4124-822b-bdc5a78cf527", ["bb5ed3951fc9"])
}, 10)
}
// output =>
// PID 63939 received SIGSEGV for address: 0x0
// 0 segfault-handler.node 0x0000000107ea2538 _ZL16segfault_handleriP9__siginfoPv + 252
// 1 libsystem_platform.dylib 0x000000019c317584 _sigtramp + 56
// 2 node.abi115.node 0x000000010865858c mdb_cursor_put + 4608
// 3 node.abi115.node 0x00000001086647e4 mdb_put + 364
// 4 node.abi115.node 0x0000000108612318 _ZN11WriteWorker8DoWritesEP7MDB_txnP7EnvWrapPjPS_ + 2352
// 5 node.abi115.node 0x0000000108612968 _ZN11WriteWorker5WriteEv + 344
// 6 node.abi115.node 0x00000001086127f4 _Z8do_writeP10napi_env__Pv + 36
// 7 node 0x00000001051958d8 _ZZN4node14ThreadPoolWork12ScheduleWorkEvENKUlP9uv_work_sE_clES2_ + 236
// 8 node 0x0000000105195638 _ZZN4node14ThreadPoolWork12ScheduleWorkEvENUlP9uv_work_sE_8__invokeES2_ + 24
// 9 libuv.1.dylib 0x0000000108277178 worker + 224
// 10 libsystem_pthread.dylib 0x000000019c2e6f94 _pthread_start + 136
// 11 libsystem_pthread.dylib 0x000000019c2e1d34 thread_start + 8
}
main()
How long does this script take to trigger a segfault? I have tried running it for about 24 without any crash.
I just ran it again and on my system it only takes a few seconds before it crashes.
During a lengthy test run, I encountered a segmentation fault in my application. I believe this may be related to the preceding errors, specifically "Commit failed" and "Operation not permitted." For further details, please refer to the attached logs.
If anyone has any insights into the cause of this issue, I would appreciate hearing from you. Thank you in advance for your assistance.
lmdb js v. 3.0.12