Closed sadikovi closed 6 years ago
@sunchao could you review this PR? Let me know if I should make any changes or add more tests. Thanks!
Files with Coverage Reduction | New Missed Lines | % | ||
---|---|---|---|---|
encodings/encoding.rs | 1 | 94.77% | ||
schema/printer.rs | 6 | 70.52% | ||
file/statistics.rs | 10 | 92.71% | ||
<!-- | Total: | 17 | --> |
Totals | |
---|---|
Change from base Build 613: | -0.06% |
Covered Lines: | 12414 |
Relevant Lines: | 12993 |
Looks good. I'm wondering if it's better to make this optional, i.e., only print it when an extra argument such as -statistics
is specified. Thought?
That is a good idea, though we would have to provide true
and -statistics
flags and they should work together. My understanding was that if user provided true
we show everything.
Slightly off-topic: I was thinking if we should improve our CLI tools, for example, have better handling of parameters and help display like cargo --help
and have different modes for parquet-schema - --short
for just schema, --ext
for bunch of other properties and --all
for everything.
I can add just an extra option for statistics or I can extend to support different options, like I mentioned above. Let me know what you think is better - I am happy to change either way!
Agree that the CLI tool needs improvement - I was also thinking that we can enable colored output as well as making the output more formatted, etc.
I think we can just re-use the true
option for everything now, and leave the aforementioned improvements as a followup. :)
This PR adds
statistics
value to display for a column when runningparquet-schema
command with additional information display (parquet-schema /path/to/file true
). Example of some prints are below:Unfortunately, statistics string can be a bit long, for example:
Closes #156