ironSource / parquetjs

fully asynchronous, pure JavaScript implementation of the Parquet file format
MIT License
346 stars 175 forks source link

Write pages #52

Open ZJONSSON opened 6 years ago

ZJONSSON commented 6 years ago

Splits each column into pages as determined by pageSize. Encoding into buffer is done as soon as we have enough rows to encode a page, even though the rowBuffer is far from being full. This ideally should reduce memory usage since the encoded pages should be substantially smaller than the shred data. Also pages facilitate scrolling fast through a column, skipping pages of no interest, and also help in locating individual records without having to read the whole column.