add snappy compression - Githubissues

fzaffarana commented 6 years ago

I'm trying to compress the parquet file after its creation, but AWS Athena can't read it.

`HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://bucket-example/data/parquet_node/year=2018/month=04/day=18/hour=18/minute=47/file.snappy.parquet (offset=0, length=11716266): can not read class parquet.format.FileMetaData: don't know what type: 15`

This query ran against the "tes" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 3b3a2df1-b202.

is it possible to add an optional snappy compression in the writer?

fzaffarana commented 6 years ago

i've read the integration tests, and i got the key, i have to add an optional option 'compress' ('SNAPPY') in the schema.

close it please, thanks!

anilsdas commented 4 years ago

Could you give more details about how this can be done. I don't find any documentation for this.

t3h2mas commented 4 years ago

@anilsdas it looks like you can add compression: 'SNAPPY' to each schema type definition.

ironSource / parquetjs

add snappy compression #59