xitongsys / parquet-go

pure golang library for reading/writing parquet file
Apache License 2.0
1.27k stars 293 forks source link

A parquet format question: reading compressed pages #588

Closed chrislusf closed 5 months ago

chrislusf commented 5 months ago

Hi, @xitongsys , one generic parquet format question that you may already know the answer:

Assuming a column is compressed, and the page size is 4kB, to decompress one page, do we need to read all previous rows? Basically, are all pages compressed together or by each page?

Tried to compress one column with various page size, the final parquet file size seems the same. This seems to suggest all pages are compressed together.