ironSource / parquetjs

fully asynchronous, pure JavaScript implementation of the Parquet file format
MIT License
345 stars 173 forks source link

[NodeJS] RangeError [ERR_OUT_OF_RANGE] when reading a parquet file #141

Open ntapsrigowri opened 1 year ago

ntapsrigowri commented 1 year ago

Unable to read a parquet file if it contains multiple lines using parquet reader results in RangeError [ERR_OUT_OF_RANGE]: The value of "offset" is out of range. It must be >= 0 and <= 79. Received 604307758(Error stack attached below) If the parquet file contains only 1 record, then it works fine.

"parquetjs": "^0.11.2", Node Version: v19.0.1 NPM version : 8.19.2 Attached parquet files with this thread

Archive.zip Test Script:


import parquetjs from 'parquetjs';
const { ParquetReader } = parquetjs;
async function readParquetFile() {
    const reader = await ParquetReader.openFile('doesntwork.parquet');
    const cursor = reader.getCursor();

    let record = '';
    while (record !== undefined) {
                record = await cursor.next();
                console.log(">>RECORD",record);

        if (!record) {
            break;
        }
    }
}
readParquetFile()
node src/operations/test.js
/test/node_modules/brotli/build/encode.js:3
1<process.argv.length?process.argv[1].replace(/\\/g,"/"):"unknown-program");b.arguments=process.argv.slice(2);"undefined"!==typeof module&&(module.exports=b);process.on("uncaughtException",function(a){if(!(a instanceof y))throw a;});b.inspect=function(){return"[Emscripten Module object]"}}else if(x)b.print||(b.print=print),"undefined"!=typeof printErr&&(b.printErr=printErr),b.read="undefined"!=typeof read?read:function(){throw"no read() available (jsc?)";},b.readBinary=function(a){if("function"===
                                                                                                                                                                                                                              ^
RangeError [ERR_OUT_OF_RANGE]: The value of "offset" is out of range. It must be >= 0 and <= 79. Received 604307758
    at new NodeError (node:internal/errors:393:5)
    at boundsError (node:internal/buffer:86:9)
    at Buffer.readUInt32LE (node:internal/buffer:220:5)
    at decodeValues_BYTE_ARRAY (/test/node_modules/parquetjs/lib/codec/plain.js:168:29)
    at exports.decodeValues (/test/node_modules/parquetjs/lib/codec/plain.js:266:14)
    at decodeValues (/test/node_modules/parquetjs/lib/reader.js:294:34)
    at decodeDataPage (/test/node_modules/parquetjs/lib/reader.js:389:16)
    at decodeDataPages (/test/node_modules/parquetjs/lib/reader.js:322:20)
    at ParquetEnvelopeReader.readColumnChunk (/test/node_modules/parquetjs/lib/reader.js:255:12)
    at async ParquetEnvelopeReader.readRowGroup (/test/node_modules/parquetjs/lib/reader.js:231:35) {
  code: 'ERR_OUT_OF_RANGE'
}
tanishqsaini1306 commented 10 months ago

Is there any solution for the above issue ? I encountered the same

chris-aeviator commented 6 months ago

happens to me whenever I try to read a file that has been saved with pandas

tanishqsaini1306 commented 6 months ago

any workaround you did to overcome this ?

chris-aeviator commented 6 months ago

Yes and no, my workaround was:

yarn remove parquetjs df.to_json(“./life/is/very-short.json”) # pandas to_json

Might not be what you anticipated.

Am 03.01.2024 um 08:24 schrieb Tanishq Saini @.***>:

 any workaround you did to overcome this ?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.