ironSource / parquetjs

fully asynchronous, pure JavaScript implementation of the Parquet file format
MIT License
345 stars 173 forks source link

Streaming read #132

Closed wholebuzz closed 2 years ago

wholebuzz commented 2 years ago

I need a streaming read. e.g a Transform stream from Readable parquet file to rows, giving every row in the file.

It seems the schema and metadata is actually at end of Parquet file.

Is the metadata truly needed to read the file, though? Can we accept some degraded characteristics and still achieve streaming read?

Is this package the right place to put such code?

wholebuzz commented 2 years ago

Seems you at least need to read the metadata first to find offsets. Most createReadStream supports offsets & lengths, though.