kriszyp / msgpackr

Ultra-fast MessagePack implementation with extension for record and structural cloning / msgpack.org[JavaScript/NodeJS]
MIT License
490 stars 53 forks source link

"Data read, but end of buffer not reached" #34

Open beaugunderson opened 3 years ago

beaugunderson commented 3 years ago

My packed file has reached 1.4gb and I now get this error immediately upon attempting to read it:

Uncaught Error: Data read, but end of buffer not reached
    at checkedRead (/Users/beau/p/zed-run/node_modules/msgpackr/dist/node.cjs:130:10)
    at unpack (/Users/beau/p/zed-run/node_modules/msgpackr/dist/node.cjs:77:10)
    at exports.readPacked (/Users/beau/p/zed-run/utilities.js:223:32)

My readPacked function is simply:

unpack(fs.readFileSync(path));
beaugunderson commented 3 years ago

One clue--this was 1.4gb of a single object with ~66k keys. I just switched to serializing it as an array instead (and re-constructing the object for fast lookups after reading) and the issue went away. It will probably not be very common for someone to use msgpackr this way, but my dataset grew from something much more modest until it it exceeded whatever limit I hit here. :)

kriszyp commented 3 years ago

Try using variableMapSize: true option to create the Packr, otherwise msgpackr uses 16-bit map length (for performance). I'll add some notes to the docs to make this clearer.

beaugunderson commented 3 years ago

That has to be it, just crossed key 65,536 :)

On Fri, Jul 30 2021 at 20:19, Kris Zyp @.***> wrote:

Try using variableMapSize: true option, otherwise msgpackr uses 16-bit map length (for performance). I'll add some notes to the docs to make this clearer.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kriszyp/msgpackr/issues/34#issuecomment-890283476, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPCXY7UO53N3FPE237FV3T2NTU5ANCNFSM5BJRFZ3Q .

beaugunderson commented 3 years ago

Hmm, there seems to be a gap in documentation.

Specifying variableMapSize: true on the decoder results in getting JavaScript Maps instead of JavaScript objects. The docs there say it only applies when useRecords is disabled, and mapAsObjects says it will default to enabled if useRecords is disabled, which leaves me to believe I should be getting objects.

beaugunderson commented 3 years ago

This does work though:

  const unpackr = new Unpackr({ mapsAsObjects: true, variableMapSize: true });
beaugunderson commented 3 years ago

Hmm, variableMapSize: true also seems to cause decoding to take much longer than an additional 10%, it's gone from ~1 second to minutes for a 102mb msgpack file.

kriszyp commented 3 years ago

Do you have a test case for how to reproduce this? I am not seeing much performance difference using variableMapSize: true with my own data. I also tried testing with an object with 65000 keys.

beaugunderson commented 3 years ago

Tried to come up with a test case just now that roughly mimics my object but it doesn't exhibit the same extreme unpacking time (I did measure how long it takes to unpack my 65k key object though, 64 seconds).

beaugunderson commented 3 years ago

Will clean up my current file and share it tomorrow, as well as compare the smaller than 65k non-variableMapSize version vs. the larger than 65k variableMapSize version directly.

beaugunderson commented 3 years ago

OK, so nowhere near as wide as a gulf as I thought! I think I probably compared my 1.4gb file, which has the same number of top-level keys but has a much higher proportion of arrays vs. nested keys, to this file, which is 102mb. The 1.4gb file takes 9 seconds, the 102mb file takes 64 seconds.

For the 102mb file:

variableMapSize version
read in 63735ms
written in 20472ms

default options version
written in 13918ms
read in 51620ms

I don't know if I crossed another performance threshold or what, but the 102mb file was much faster until recently; we can safely say the majority of the perf change wasn't from variableMapSize though.

kriszyp commented 3 years ago

Let me know if you do have any performance anomalies that I could reproduce.

LekoArts commented 2 years ago

Hi @kriszyp 👋

We're running into this same error with Gatsby now. It seems that the recent 1.5.3 release caused this.

The output one gets is:

"internal-data-bridge" threw an error while running the sourceNodes lifecycle:

Data read, but end of buffer not reached

File: node_modules/gatsby/src/internal-plugins/internal-data-bridge/gatsby-node.js:55:5

  Error: Data read, but end of buffer not reached

  - node.cjs:178 checkedRead
    [browser-ssr-tsx]/[msgpackr]/dist/node.cjs:178:10

  - node.cjs:91 Packr.unpack
    [browser-ssr-tsx]/[msgpackr]/dist/node.cjs:91:10

  - node.cjs:150 Packr.decode
    [browser-ssr-tsx]/[msgpackr]/dist/node.cjs:150:15

  - index.js:440 Object.get
    [browser-ssr-tsx]/[lmdb-store]/index.js:440:63

  - caching.js:37 Object.get
    [browser-ssr-tsx]/[lmdb-store]/caching.js:37:17

  - lmdb-datastore.ts:148 Object.getNode
    [browser-ssr-tsx]/[gatsby]/src/datastore/lmdb/lmdb-datastore.ts:148:16

  - index.ts:18 getNode
    [browser-ssr-tsx]/[gatsby]/src/datastore/index.ts:18:18

  - public.js:767 createNode
    [browser-ssr-tsx]/[gatsby]/src/redux/actions/public.js:767:19

  - public.js:863 
    [browser-ssr-tsx]/[gatsby]/src/redux/actions/public.js:863:21

  - index.js:23 
    [browser-ssr-tsx]/[redux-thunk]/lib/index.js:23:18

  - redux.js:554 boundActionCreator
    [browser-ssr-tsx]/[redux]/lib/redux.js:554:12

  - api-runner-node.js:79 createNode
    [browser-ssr-tsx]/[gatsby]/src/utils/api-runner-node.js:79:20

  - gatsby-node.js:55 forEach
    [browser-ssr-tsx]/[gatsby]/src/internal-plugins/internal-data-bridge/gatsby-node.js:55:5

  - Array.forEach

  - gatsby-node.js:53 Object.exports.sourceNodes
    [browser-ssr-tsx]/[gatsby]/src/internal-plugins/internal-data-bridge/gatsby-node.js:53:20

  - api-runner-node.js:462 runAPI
    [browser-ssr-tsx]/[gatsby]/src/utils/api-runner-node.js:462:22

internal-data-bridge is something from us so that's why it is printing that.

You can reproduce this by running npm init gatsby -y issue-34, let the installation run and then do two npm run build in a row. On the second run you'll see this error. If you do a npm run clean in between (it removes the LMDB store data) in between, it works.

Let me know if you need any further information. Thanks a lot!

kriszyp commented 2 years ago

Thank you for the great detailed report, should be fixed in v1.5.4.

LekoArts commented 2 years ago

Awesome, thanks for the quick reply. Let me try it and I'll report back 👍

LekoArts commented 2 years ago

Works with 1.5.4 again. Thanks again!

alexbezhan commented 1 year ago

I'm getting similar error. Node server code:

const stream = new PackrStream({
        useRecords: true,
        bundleStrings: true,
    })
stream.pipe(response)
stream.write(...)
stream.end()

Client code:

const packr = new Packr({
    useRecords: true,
    bundleStrings: true, // client-side performance
})
const response = fetch(...)
const rows = packr.unpack(new Uint8Array(await response.arrayBuffer()))

Error:

Error: Data read, but end of buffer not reached {"id":0,"portfolio":"Z2 Force","campaign":"11_Profitable_Z2_Force","adGroup":"11_7_Profitable_Z2_For
    at checkedRead (st-table.iife.js?v=bbc4c8fdf7d4fa35e6fc73429d21fd76710cf374:195:18548)
    at Packr.unpack (st-table.iife.js?v=bbc4c8fdf7d4fa35e6fc73429d21fd76710cf374:195:16178)
    at Object.fetchRows [as queryFn] (st-table.iife.js?v=bbc4c8fdf7d4fa35e6fc73429d21fd76710cf374:289:50568)
kriszyp commented 1 year ago

@alexbezhan do you know if this can reproduced without streams (just pack and unpack)? Is any particular data necessary to reproduce?

alexbezhan commented 1 year ago

@alexbezhan do you know if this can reproduced without streams (just pack and unpack)? Is any particular data necessary to reproduce?

Today I faced same issue. Now I can tell it can be reproduced without streams. My case today was that I encoded object with structures, and then tried to decode it back with empty structures(issue was in my db not saving structures properly).

Rossb0b commented 8 months ago

@kriszyp 👋 went through the same issue using my implementation

Here is some minimal code.

import { Packr } from "msgpackr";

const packr = new Packr({
  maxSharedStructures: 8160,
  structures: []
});

type CustomPackFn<T extends StringOrObject = Record<string, any>> =
  (value: Partial<T>) => Buffer;

type CustomUnpackFn<T extends StringOrObject = Record<string, any>> =
  (messagePack: Buffer | Uint8Array) => T;

export class KVPeer<T extends StringOrObject = StringOrObject, K extends Record<string, any> | null = null> extends EventEmitter {
  protected customPack = packr.pack.bind(packr) as CustomPackFn<T>;
  protected customUnpack = packr.unpack.bind(packr) as CustomUnpackFn<T>;

 async setValue(options: SetValueOptions<T>): Promise<KeyType> {
    const { key, value, expiresIn } = options;

    const finalKey = typeof key === "object" ? Buffer.from(this.prefix + key) : this.prefix + key;
    const multiRedis = this.redis.multi();

    if (this.type === "raw") {
      const payload = typeof value === "string" ? value : JSON.stringify(value);
      multiRedis.set(finalKey, payload);
    }
    else {
      const buffer = this.customPack(value);

      multiRedis.set(finalKey, buffer);
    }

    if (expiresIn) {
      multiRedis.pexpire(finalKey, expiresIn);
    }

    await multiRedis.exec();

    return finalKey;
  }

  async getValue(key: KeyType): Promise<MappedValue<T, K> | null> {
    const finalKey = typeof key === "object" ? Buffer.from(this.prefix + key) : this.prefix + key;

    const result = (this.type === "raw" ?
      await this.redis.get(finalKey) :
      await this.handlePackedOrMappedObject(finalKey)) as T;

    if (this.type === "object" && result && Object.keys(result).length === 0) {
      return null;
    }

    return result === null ? null : this.mapValue(result);
  }

  private async handlePackedOrMappedObject(key: KeyType): Promise<T | null> {
    let result: T | null = null;

    try {
      const packedValue = await this.redis.getBuffer(key);

      if (packedValue !== null) {
        result = this.customUnpack(packedValue);
      }
    }
    catch (error) {
      if (error.message !== kWrongRedisCommandError) {
        throw error;
      }

      result = this.deepParse(await this.redis.hgetall(key));
    }

    return result;
  }
}

I didn't manage to reproduce it locally at this point, but the error occurred when deployed on a scaling environment.

I still have to investigate it as it isn't a huge priority on my side.

❤️