Closed yonesko closed 2 years ago
Hello @yonesko
There is currently no control of the row group size in bytes. Since parquet columns are encoded and compressed, I would like to ask what size you would need to control: would it be the compressed size of the row group on disk, or the total decoded size?
Compressed size of row group
Do you mind providing a bit more context on the use case that would require controlling the size of a row group on disk?
We have a big (4GB) parquet file with one row group, and Amazon Athena fails to read with "GENERIC_INTERNAL_ERROR: integer overflow" We can limit RG by rows number and error disappeared
Hello, I haven't found how to control row group size.
Yes, I can call
Flush
, but how do I know if row group reached limit size (1GB for example) ?