keichi / binary-parser

A blazing-fast declarative parser builder for binary data
MIT License
857 stars 133 forks source link

RangeError: Invalid array length #208

Open AmitMY opened 2 years ago

AmitMY commented 2 years ago

I am trying to parse an a float array. Normally, this code works, however, I now have one huge file (400MB of file), and I want to start reading it.

    const dataParser = newParser()
        .array("data", {
            type: "floatle",
            length: dataLength // 82,272,642
        })
        .saveOffset('dataLength');

    const data = dataParser.parse(buffer);

As you can see, I am trying to parse an array with 82 million entries, which is less than the 2147483647 limit in javascript, however, I am getting the following error:

Uncaught (in promise) RangeError: Invalid array length at Array.push () at Parser.eval [as compiled]

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Errors/Invalid_array_length

Additional information

Manual experimentation finds that the limit is somewhere between 50,100,000 and 50,500,000.

(related to https://github.com/sign/translate/issues/44)

keichi commented 2 years ago

Thanks for reporting. I've never really created such huge arrays in JS.

Could you test if you can create an array with the same size (82 million elements) w/o using binary-parser? Your parser compiles to the following code, but I don't see anything suspicious that would bloat the memory footprint than expected.

var dataView = new DataView(buffer.buffer, buffer.byteOffset, buffer.length);
var offset = 0;
var vars = {};

vars.data = [];
for (var $tmp0 = 82272642; $tmp0 > 0; $tmp0--) {
    var $tmp1 = dataView.getFloat32(offset, true);
    offset += 4;
    vars.data.push($tmp1);
}
vars.dataLength = offset

return vars;
AmitMY commented 2 years ago

This too fails, on the vars.data.push line. If I add a log before that push, I get that the current vars.data.length is 50139473

Code that works:

    data.data = new Float32Array(82272642);
    for (var $tmp0 = 0; $tmp0 < 82272642; $tmp0++) {
        var $tmp1 = dataView.getFloat32(offset, true);
        offset += 4;
        data.data[$tmp0] = $tmp1
    }
    data.dataLength = offset

If I initialize the necessary float32 array, that it is fine. Also, never needs to realloc.

This method btw, is 8 times faster, for an array of size 27,424,214, compared to the regular parsing.

keichi commented 2 years ago

Ok, that makes sense. Reallocs are definitely an overhead, and I guess typed arrays are more compact than normal arrays. But this approach would only work for fixed-length arrays of primitive types. Is that what you are parsing?

AmitMY commented 2 years ago

Yes, the largest arrays that I parse are indeed of fixed sizes (as in, I specify length to be parsed). The normal behavior is still good for short arrays, I'd imagine.

keichi commented 2 years ago

It turns out you can directly create a Float32Array from an ArrayBuffer (zero copy). https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Float32Array/Float32Array

Can you try if the following works?

const dataParser = new Parser()
    .buffer("data", {
        length: 82272642 * 4, // length in bytes
        formatter: (buf) => new Float32Array(buf.buffer) // buf is a DataView
    })
    .saveOffset('dataLength');
AmitMY commented 2 years ago

sorry, i missed your previous message.

if i do what you wrote, and add a console.log

            formatter: (buf) => {
                console.log(buf);
                return new Float32Array(buf.buffer)
            } 

In the console I see that buf is a Uint8Array, and an error:

Uncaught (in promise) RangeError: byte length of Float32Array should be a multiple of 4

because buf.buffer has an odd number of bytes

image

(for completeness sake, this is not the original large one, just a small scale test using length: 26578 * 4, and a seek of 1925 to get to the right place in this file)

AmitMY commented 2 years ago

Hi @keichi Is there any plan to support this in this library? If no plan, I'll use the custom solution, but it would be nice to at least catch this type of error and point people to this issue or some fix for future people.


I tried to write a test for it, but it passes, so I think it's out of my league to contribute here

describe('Large arrays', () => {
    it('should parse large array without error', () => {
      const length = 80_000_000;
      const array = Buffer.from(new Float32Array(length).fill(0).buffer);

      const parser = new Parser()
        .array("data", {
          type: "floatle",
          length
        });

      const buffer = factory(array);
      doesNotThrow(() => parser.parse(buffer));
    })
  })