ironSource / parquetjs

fully asynchronous, pure JavaScript implementation of the Parquet file format
MIT License
345 stars 173 forks source link

Maximum call stack size exceeded when decoding values #110

Open JRGranell opened 4 years ago

JRGranell commented 4 years ago

Hi, I'm trying to read a local file, approximate 1.8Gb with 18790733 rows, SNAPPY compression. On executing the following code in Node 12

const file = './data.parquet';

(async () => {
    const reader = await parquet.ParquetReader.openFile(file);
    console.log(`Row count: ${reader.getRowCount()}`);

    const cursor = reader.getCursor();
    let record = null;
    while (record = await cursor.next()) {
       console.log(record);
   }
})();

(I've removed the try/catch for brevity)

is prints the row count, but throws this error on cursor.next()

RangeError: Maximum call stack size exceeded
    at Object.exports.decodeValues (/node_modules/parquetjs/lib/codec/rle.js:140:14)
    at decodeValues (node_modules/parquetjs/lib/reader.js:294:34)
    at decodeDataPage (node_modules/parquetjs/lib/reader.js:371:15)
    at decodeDataPages (node_modules/parquetjs/lib/reader.js:322:20)
    at ParquetEnvelopeReader.readColumnChunk (node_modules/parquetjs/lib/reader.js:255:12)
    at async ParquetEnvelopeReader.readRowGroup (node_modules/parquetjs/lib/reader.js:231:35)
    at async ParquetCursor.next (node_modules/parquetjs/lib/reader.js:57:23)

Would the file size or row count be too large for this to be processed? Alternatively, is there a way to stream the file to read/ decode one row at a time?

Thanks in advance,

YonatanHanan commented 2 years ago

did you find any solution?

bradgreens commented 2 years ago

I had this issue too and after some troubleshooting gave up. Instead, I was able to convert my large parquet file to json using this Rust project. https://github.com/jupiter/parquet2json

jgold21 commented 1 year ago

I also had this issue with using repeated: true, and large amount of data. The issue is inside the rle and reader. Changing the code to use a safer array copy fixed the issue.

exports.arrayCopy = function(dest, src) { const len = src.length for(let i = 0; i < len; i++){ dest.push(src[i]) } }

hankgoodness commented 8 months ago

I'm having this issue while trying to read a 1.7MB file. @jgold21 can you say a little more about how you fixed this issue? I can't see how to use your code in rel.js - but its probably a problem with my comprehension rather than your javascript :-D

Jiia commented 6 months ago

I'm having the same issue as well, with a file with 13049 rows, reading only one of the columns. The workaround by @jgold21 doesn't seem to apply, there is no such function in the codebase anymore.