I got this working mostly. But when trying to query the data generated using parquetJS using Athena, I am getting the following error:
HIVE_CURSOR_ERROR: can not read class parquet.format.PageHeader: Required field 'compressed_page_size' was not found in serialized data! Struct: PageHeader(type:null, uncompressed_page_size:3, compressed_page_size:0)
I am creating the Athena table like this:
CREATE EXTERNAL TABLE IF NOT EXISTS MyTable (
... columns...
) STORED AS PARQUET
LOCATION 's3://folder/to/data'
tblproperties ("parquet.compress"="SNAPPY")
Debugging locally, it does hit the line where these headers are written. So not sure where this is going wrong.
When creating the writer, I am passing opts as {compression: "SNAPPY"}
First, thanks for a wonderful library.
I got this working mostly. But when trying to query the data generated using parquetJS using Athena, I am getting the following error:
HIVE_CURSOR_ERROR: can not read class parquet.format.PageHeader: Required field 'compressed_page_size' was not found in serialized data! Struct: PageHeader(type:null, uncompressed_page_size:3, compressed_page_size:0)
I am creating the Athena table like this:
Debugging locally, it does hit the line where these headers are written. So not sure where this is going wrong.
When creating the writer, I am passing opts as
{compression: "SNAPPY"}
Can you please help with any pointers?
Regards, Arnab.