mukunku / ParquetViewer

Simple Windows desktop application for viewing & querying Apache Parquet files
GNU General Public License v3.0
783 stars 98 forks source link

[FEAT] View per-column metadata #84

Closed ngbrown closed 1 year ago

ngbrown commented 1 year ago

Describe the feature you'd like to be added to Parquet Viewer

I would like the metadata viewer to show the custom key_value_metadata added to each column of the schema. PyArrow's API seems to allow this to be added at the schema level, while Parquet.Net's API adds it per row group, which is more inline with the actual file structure.

Share why this feature would be a good addition to the utility

I want to validate that I'm building Parquet files correctly with the data I expect. I would like to use metadata for per-column information like units and description.

mukunku commented 1 year ago

I added some column metadata to the row groups in https://github.com/mukunku/ParquetViewer/releases/tag/v2.7.2 . Can you check it out and see if that's good enough for your needs?

ngbrown commented 1 year ago

@mukunku I see the column metadata (KeyValueMetadata) in each row group. I like that the byte sizes of each column are now available to get an idea of how well each column is de-duplicating and compressing.

This extra information also greatly increases the line count of the metadata window (2,440 -> 78,498), so I copied the json into Visual Studio Code to make use of the collapsing and search. I may have too many row groups... Anyways doing more with the UI will have diminishing returns because other editors will do a better job. Maybe a copy button would be the extent of any change I would suggest to the metadata viewer window?

Thank you for your tool and I'm now able to use this feature request for what I needed.

mukunku commented 1 year ago

Thank you for the feedback. I reverted the extra info I added to the thrift metadata being shown. And added a "Copy Raw Metadata" button as you suggested. You can find this new feature available in: v2.7.2.1 image

Really appreciate the ideas to help make the app better. Closing out this ticket for now but feel free to reopen if required.