Open beaugunderson opened 3 years ago
One clue--this was 1.4gb of a single object with ~66k keys. I just switched to serializing it as an array instead (and re-constructing the object for fast lookups after reading) and the issue went away. It will probably not be very common for someone to use msgpackr this way, but my dataset grew from something much more modest until it it exceeded whatever limit I hit here. :)
Try using variableMapSize: true
option to create the Packr
, otherwise msgpackr uses 16-bit map length (for performance). I'll add some notes to the docs to make this clearer.
That has to be it, just crossed key 65,536 :)
On Fri, Jul 30 2021 at 20:19, Kris Zyp @.***> wrote:
Try using variableMapSize: true option, otherwise msgpackr uses 16-bit map length (for performance). I'll add some notes to the docs to make this clearer.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kriszyp/msgpackr/issues/34#issuecomment-890283476, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPCXY7UO53N3FPE237FV3T2NTU5ANCNFSM5BJRFZ3Q .
Hmm, there seems to be a gap in documentation.
Specifying variableMapSize: true
on the decoder results in getting JavaScript Map
s instead of JavaScript objects. The docs there say it only applies when useRecords is disabled, and mapAsObjects says it will default to enabled if useRecords is disabled, which leaves me to believe I should be getting objects.
This does work though:
const unpackr = new Unpackr({ mapsAsObjects: true, variableMapSize: true });
Hmm, variableMapSize: true
also seems to cause decoding to take much longer than an additional 10%, it's gone from ~1 second to minutes for a 102mb msgpack file.
Do you have a test case for how to reproduce this? I am not seeing much performance difference using variableMapSize: true
with my own data. I also tried testing with an object with 65000 keys.
Tried to come up with a test case just now that roughly mimics my object but it doesn't exhibit the same extreme unpacking time (I did measure how long it takes to unpack my 65k key object though, 64 seconds).
Will clean up my current file and share it tomorrow, as well as compare the smaller than 65k non-variableMapSize
version vs. the larger than 65k variableMapSize
version directly.
OK, so nowhere near as wide as a gulf as I thought! I think I probably compared my 1.4gb file, which has the same number of top-level keys but has a much higher proportion of arrays vs. nested keys, to this file, which is 102mb. The 1.4gb file takes 9 seconds, the 102mb file takes 64 seconds.
For the 102mb file:
variableMapSize version
read in 63735ms
written in 20472ms
default options version
written in 13918ms
read in 51620ms
I don't know if I crossed another performance threshold or what, but the 102mb file was much faster until recently; we can safely say the majority of the perf change wasn't from variableMapSize though.
Let me know if you do have any performance anomalies that I could reproduce.
Hi @kriszyp 👋
We're running into this same error with Gatsby now. It seems that the recent 1.5.3
release caused this.
The output one gets is:
"internal-data-bridge" threw an error while running the sourceNodes lifecycle:
Data read, but end of buffer not reached
File: node_modules/gatsby/src/internal-plugins/internal-data-bridge/gatsby-node.js:55:5
Error: Data read, but end of buffer not reached
- node.cjs:178 checkedRead
[browser-ssr-tsx]/[msgpackr]/dist/node.cjs:178:10
- node.cjs:91 Packr.unpack
[browser-ssr-tsx]/[msgpackr]/dist/node.cjs:91:10
- node.cjs:150 Packr.decode
[browser-ssr-tsx]/[msgpackr]/dist/node.cjs:150:15
- index.js:440 Object.get
[browser-ssr-tsx]/[lmdb-store]/index.js:440:63
- caching.js:37 Object.get
[browser-ssr-tsx]/[lmdb-store]/caching.js:37:17
- lmdb-datastore.ts:148 Object.getNode
[browser-ssr-tsx]/[gatsby]/src/datastore/lmdb/lmdb-datastore.ts:148:16
- index.ts:18 getNode
[browser-ssr-tsx]/[gatsby]/src/datastore/index.ts:18:18
- public.js:767 createNode
[browser-ssr-tsx]/[gatsby]/src/redux/actions/public.js:767:19
- public.js:863
[browser-ssr-tsx]/[gatsby]/src/redux/actions/public.js:863:21
- index.js:23
[browser-ssr-tsx]/[redux-thunk]/lib/index.js:23:18
- redux.js:554 boundActionCreator
[browser-ssr-tsx]/[redux]/lib/redux.js:554:12
- api-runner-node.js:79 createNode
[browser-ssr-tsx]/[gatsby]/src/utils/api-runner-node.js:79:20
- gatsby-node.js:55 forEach
[browser-ssr-tsx]/[gatsby]/src/internal-plugins/internal-data-bridge/gatsby-node.js:55:5
- Array.forEach
- gatsby-node.js:53 Object.exports.sourceNodes
[browser-ssr-tsx]/[gatsby]/src/internal-plugins/internal-data-bridge/gatsby-node.js:53:20
- api-runner-node.js:462 runAPI
[browser-ssr-tsx]/[gatsby]/src/utils/api-runner-node.js:462:22
internal-data-bridge
is something from us so that's why it is printing that.
You can reproduce this by running npm init gatsby -y issue-34
, let the installation run and then do two npm run build
in a row. On the second run you'll see this error. If you do a npm run clean
in between (it removes the LMDB store data) in between, it works.
Let me know if you need any further information. Thanks a lot!
Thank you for the great detailed report, should be fixed in v1.5.4.
Awesome, thanks for the quick reply. Let me try it and I'll report back 👍
Works with 1.5.4 again. Thanks again!
I'm getting similar error. Node server code:
const stream = new PackrStream({
useRecords: true,
bundleStrings: true,
})
stream.pipe(response)
stream.write(...)
stream.end()
Client code:
const packr = new Packr({
useRecords: true,
bundleStrings: true, // client-side performance
})
const response = fetch(...)
const rows = packr.unpack(new Uint8Array(await response.arrayBuffer()))
Error:
Error: Data read, but end of buffer not reached {"id":0,"portfolio":"Z2 Force","campaign":"11_Profitable_Z2_Force","adGroup":"11_7_Profitable_Z2_For
at checkedRead (st-table.iife.js?v=bbc4c8fdf7d4fa35e6fc73429d21fd76710cf374:195:18548)
at Packr.unpack (st-table.iife.js?v=bbc4c8fdf7d4fa35e6fc73429d21fd76710cf374:195:16178)
at Object.fetchRows [as queryFn] (st-table.iife.js?v=bbc4c8fdf7d4fa35e6fc73429d21fd76710cf374:289:50568)
@alexbezhan do you know if this can reproduced without streams (just pack and unpack)? Is any particular data necessary to reproduce?
@alexbezhan do you know if this can reproduced without streams (just pack and unpack)? Is any particular data necessary to reproduce?
Today I faced same issue. Now I can tell it can be reproduced without streams. My case today was that I encoded object with structures, and then tried to decode it back with empty structures(issue was in my db not saving structures properly).
@kriszyp 👋 went through the same issue using my implementation
Here is some minimal code.
import { Packr } from "msgpackr";
const packr = new Packr({
maxSharedStructures: 8160,
structures: []
});
type CustomPackFn<T extends StringOrObject = Record<string, any>> =
(value: Partial<T>) => Buffer;
type CustomUnpackFn<T extends StringOrObject = Record<string, any>> =
(messagePack: Buffer | Uint8Array) => T;
export class KVPeer<T extends StringOrObject = StringOrObject, K extends Record<string, any> | null = null> extends EventEmitter {
protected customPack = packr.pack.bind(packr) as CustomPackFn<T>;
protected customUnpack = packr.unpack.bind(packr) as CustomUnpackFn<T>;
async setValue(options: SetValueOptions<T>): Promise<KeyType> {
const { key, value, expiresIn } = options;
const finalKey = typeof key === "object" ? Buffer.from(this.prefix + key) : this.prefix + key;
const multiRedis = this.redis.multi();
if (this.type === "raw") {
const payload = typeof value === "string" ? value : JSON.stringify(value);
multiRedis.set(finalKey, payload);
}
else {
const buffer = this.customPack(value);
multiRedis.set(finalKey, buffer);
}
if (expiresIn) {
multiRedis.pexpire(finalKey, expiresIn);
}
await multiRedis.exec();
return finalKey;
}
async getValue(key: KeyType): Promise<MappedValue<T, K> | null> {
const finalKey = typeof key === "object" ? Buffer.from(this.prefix + key) : this.prefix + key;
const result = (this.type === "raw" ?
await this.redis.get(finalKey) :
await this.handlePackedOrMappedObject(finalKey)) as T;
if (this.type === "object" && result && Object.keys(result).length === 0) {
return null;
}
return result === null ? null : this.mapValue(result);
}
private async handlePackedOrMappedObject(key: KeyType): Promise<T | null> {
let result: T | null = null;
try {
const packedValue = await this.redis.getBuffer(key);
if (packedValue !== null) {
result = this.customUnpack(packedValue);
}
}
catch (error) {
if (error.message !== kWrongRedisCommandError) {
throw error;
}
result = this.deepParse(await this.redis.hgetall(key));
}
return result;
}
}
I didn't manage to reproduce it locally at this point, but the error occurred when deployed on a scaling environment.
I still have to investigate it as it isn't a huge priority on my side.
❤️
My packed file has reached 1.4gb and I now get this error immediately upon attempting to read it:
My
readPacked
function is simply: