ironSource / parquetjs

fully asynchronous, pure JavaScript implementation of the Parquet file format
MIT License
345 stars 173 forks source link

Cannot write a parquet file having a comma in one of its headers #117

Open bartero opened 3 years ago

bartero commented 3 years ago

I came across a very unfortunate problem using this library.
Whenever I try to read a parquet file created with this same tool and containing a comma , in any of its headers. I get this error while await parquetReader.getCursor().next(): TypeError: Cannot read property 'fields' of undefined
stacktrace:

at ParquetSchema.findField (../../.yarn/cache/parquetjs-npm-0.11.2-9df3a54481-63137e17bc.zip/node_modules/parquetjs/lib/schema.js:35:22)
      at Object.exports.materializeRecords (../../.yarn/cache/parquetjs-npm-0.11.2-9df3a54481-63137e17bc.zip/node_modules/parquetjs/lib/shred.js:164:26)
      at ParquetCursor.next (../../.yarn/cache/parquetjs-npm-0.11.2-9df3a54481-63137e17bc.zip/node_modules/parquetjs/lib/reader.js:62:40)

I guess this is caused by a rather "unsafe" operation in the parquetjs/lib/schema.js file on line 28: path.split(",")

I would be very helpful for any help with this problem. Thank you!

bartero commented 3 years ago

Hi! I have written a solution for it! Look below :-)