manojkarthick / pqrs

Command line tool for inspecting Parquet files
Apache License 2.0
294 stars 29 forks source link

Added config file to merge command #48

Open CalderWhite opened 11 months ago

CalderWhite commented 11 months ago

First off: I love pqrs!! Thank you so much for creating it and maintaining it :)

The code in this PR is super bare bones and does not implement everything. It was just really annoying to have all my data blow up whenever I would merge my chunks together. At the same time, I did not want to rewrite a compressed merge script for every use case.

The solution I landed on was to use a small config file to specify the most impactful options like compression, compression_level, set_dictionary_enabled and also column level encodings.

Putting this PR up in case you are interested in using it in the main branch. It would be cool to not have to tell people to install my fork haha.

Thanks

CalderWhite commented 11 months ago

happy open source friday!

LOLLL

CalderWhite commented 11 months ago

Responded to all of your comments! Some of them can be completely removed from the PR, others require some modification and finally the config file is a matter of your opinion for the main repo.