kriszyp / lmdb-js

Simple, efficient, ultra-fast, scalable data store wrapper for LMDB
Other
521 stars 44 forks source link

Value of size 333MB fails to be fetched from the DB #205

Open janekolszak opened 1 year ago

janekolszak commented 1 year ago

Hi! Is it expected that it's not possible to get() values bigger than 333MB? It is possible to fetch with getBinary().

Thank you!

kriszyp commented 1 year ago

In my tests, I was able to get values up to 2gb. What error are you seeing?

janekolszak commented 1 year ago

I'm seeing:

<--- Last few GCs --->                                                                                                                                                                                                                        

[1104752:0x5672330]     6147 ms: Scavenge 299.4 (333.2) -> 299.4 (333.2) MB, 34.5 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure                                                                                       
[1104752:0x5672330]     7458 ms: Scavenge 491.4 (525.2) -> 491.4 (525.2) MB, 65.3 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure                                                                                       
[1104752:0x5672330]    10035 ms: Scavenge 875.4 (909.2) -> 875.4 (909.2) MB, 128.5 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure                                                                                      

<--- JS stacktrace --->                                                                                                                                                                                                                       

FATAL ERROR: invalid array length Allocation failed - JavaScript heap out of memory                                                                                                                                                           
 1: 0xa04200 node::Abort() [node]                                                                                                                                                                                                             
 2: 0x94e4e9 node::FatalError(char const*, char const*) [node]                                                                                                                                                                                
 3: 0xb797be v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]                                                                                                                                                    
 4: 0xb79b37 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]                                                                                                                                      
 5: 0xd343c5  [node]                                                                                                                                                                                                                          
 6: 0xd0cf05  [node]                                                                                                                                                                                                                          
 7: 0xe962ae  [node]
 8: 0xe9b9f4  [node]
 9: 0xe9bcb8  [node]
10: 0xeef18b v8::internal::JSObject::AddDataElement(v8::internal::Handle<v8::internal::JSObject>, unsigned int, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes) [node]
11: 0xf43c92 v8::internal::Object::AddDataProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::Maybe<v8::internal::ShouldThrow>, v8::internal::StoreOrigin) [node]
12: 0xf46f8f v8::internal::Object::SetProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::StoreOrigin, v8::Maybe<v8::internal::ShouldThrow>) [node]
13: 0x10709c5 v8::internal::Runtime::SetObjectProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::StoreOrigin, 
v8::Maybe<v8::internal::ShouldThrow>) [node]
14: 0xdcfb6a v8::internal::Runtime_KeyedStoreIC_Slow(int, unsigned long*, v8::internal::Isolate*) [node]
15: 0x14011f9  [node]
janekolszak commented 1 year ago

This is a small demo (it fails on Ubuntu 20, lmdb-js v2.7.3) with a segfault.

async function main() { const db = open<any, string>({ path: ./test-cache, });

const big = fs.readFileSync("./bigfile.bin")

await db.put("id", big.toString())
console.log("put() success")

let loaded = await db.getBinary("id")
console.log("getBinary() success")

loaded = await db.getBinaryFast("id")
console.log("getBinaryFast() success")

loaded = await db.get("id")
console.log("get() success")
await db.close()

console.log("OK")

}

main().catch((e) => console.error(e));



If you create a smaller file and run it it prints OK:
`head -c 1MB /dev/urandom > bigfile.bin`
kriszyp commented 1 year ago

I am kind of wondering if this is a V8 bug. I can actually trigger an error without lmdb at all by doing this with your code:

const big = fs.readFileSync("./bigfile.bin")
let str = big.toString();
let d = new TextEncoder().encode(str);
let s = (new TextDecoder()).decode(d);
console.log(s.length)

I think this error might be occurring in the msgpackr's native decoder and might not be properly handled there, but even if it was, V8 doesn't seem to be capable of decoding this string.

janekolszak commented 1 year ago

Maybe there should be an option for getRange() to use something like getBinary() ?

kriszyp commented 1 year ago

Yes, @janekolszak, that seems like a reasonable option to add. I will try to get that in the next release.

janekolszak commented 1 year ago

Maybe lmdb-js decodes values into strings?

There seems to be a limit of 512MB for string size on 32b systems, but I see it on my 64b system with node 19.3.0 (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/length)

Your demo on node 18.12.1 fails with:

TypeError [ERR_ENCODING_INVALID_ENCODED_DATA]: The encoded data was not valid for encoding utf-8
    at new NodeError (node:internal/errors:393:5)
    at TextDecoder.decode (node:internal/encoding:433:15)
    at Object.<anonymous> (/home/jan/work/lmdb/tools/big-file.ts:7:29)
    at Module._compile (node:internal/modules/cjs/loader:1159:14)
    at Module.m._compile (/home/jan/.nvm/versions/node/v18.12.1/lib/node_modules/ts-node/src/index.ts:1618:23)
    at Module._extensions..js (node:internal/modules/cjs/loader:1213:10)
    at Object.require.extensions.<computed> [as .ts] (/home/jan/.nvm/versions/node/v18.12.1/lib/node_modules/ts-node/src/index.ts:1621:12)
    at Module.load (node:internal/modules/cjs/loader:1037:32)
    at Function.Module._load (node:internal/modules/cjs/loader:878:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12) {
  errno: 1,
  code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
}

Your demo on node 19.3.0 fails with:

Error: Cannot create a string longer than 0x1fffffe8 characters
    at TextDecoder.decode (node:internal/encoding:428:16)
    at Object.<anonymous> (/home/jan/work/lmdb/tools/big-file.ts:7:29)
    at Module._compile (node:internal/modules/cjs/loader:1218:14)
    at Module.m._compile (/home/jan/.nvm/versions/node/v19.3.0/lib/node_modules/ts-node/src/index.ts:1618:23)
    at Module._extensions..js (node:internal/modules/cjs/loader:1272:10)
    at Object.require.extensions.<computed> [as .ts] (/home/jan/.nvm/versions/node/v19.3.0/lib/node_modules/ts-node/src/index.ts:1621:12)
    at Module.load (node:internal/modules/cjs/loader:1081:32)
    at Function.Module._load (node:internal/modules/cjs/loader:922:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:82:12)
    at phase4 (/home/jan/.nvm/versions/node/v19.3.0/lib/node_modules/ts-node/src/bin.ts:649:14) {
  code: 'ERR_STRING_TOO_LONG'
}

My demo on node 18.12.1 fails with:

put() success
getBinary() success
getBinaryFast() success
TypeError: Cannot read properties of undefined (reading '0')
    at readString (/home/jan/work/lmdb/node_modules/msgpackr/unpack.js:568:22)
    at read (/home/jan/work/lmdb/node_modules/msgpackr/unpack.js:454:12)
    at checkedRead (/home/jan/work/lmdb/node_modules/msgpackr/unpack.js:195:13)
    at Packr.unpack (/home/jan/work/lmdb/node_modules/msgpackr/unpack.js:102:12)
    at Packr.decode (/home/jan/work/lmdb/node_modules/msgpackr/unpack.js:174:15)
    at LMDBStore.get (/home/jan/work/lmdb/node_modules/lmdb/read.js:230:70)
    at main (/home/jan/work/lmdb/tools/big-file.ts:22:23)

My demo on node 19.3.0 fails with:

put() success
getBinary() success
getBinaryFast() success
[1]    51080 segmentation fault (core dumped)  ts-node ./tools/big-file.ts

Your demo modified for Deno:

error: Uncaught (in promise) TypeError: Cannot allocate String: buffer exceeds maximum length.
    at async Object.readTextFile (deno:runtime/js/40_read_file.js:56:20)
kriszyp commented 1 year ago

Maybe lmdb-js decodes values into strings?

lmdb-js (msgpackr) preserves the types of values, so if you encode a string, it will be decoded as a string. And you are explicitly converting your data to a string when it is stored/encoded (so lmdb-js decodes to a string to match what you requested/stored): await db.put("id", big.toString())

(so if you don't want it decoded as a string, don't store it as a string, store it as a buffer/binary data)

There seems to be a limit of 512MB for string size on 32b systems, but I see it on my 64b system with node 19.3.0

It doesn't seem that surprising that V8 would change this without MDN being updated yet (maybe they felt it was better to be consistent so that there is no behavioral differences that can be observed/detected between architectures).