sharkdp / hexyl

A command-line hex viewer
Apache License 2.0
8.92k stars 227 forks source link

Adds --columns for Print using custom width #13 #164

Closed sharifhsn closed 1 year ago

sharifhsn commented 1 year ago

This pull request fixes #13 partially by adding the --columns option to manually set the number of columns. The default is 2 as with previous behavior.

Some notes:

sharifhsn commented 1 year ago

I have a pull request ready for terminal width, but it requires adding the terminal_width dependency. I'll wait to submit it until this request is approved.

sharifhsn commented 1 year ago

Thank you very much. Works great!

Does this have any impact on performance? (see e.g. #66 (comment) on how to perform hexyl benchmarks using hyperfine)

No, I'll do that now. I couldn't find a benchmark harness before, thank you for pointing me to it.

sharkdp commented 1 year ago

We should probably add a script with a few useful benchmarks, true. For this PR (or in general), I would suggest to use something like:

#!/bin/bash

dd if=/dev/urandom bs=1M count=1 of=/tmp/data

hyperfine \
    --warmup 3 \
    --shell=none \
    --output=pipe \
    --export-markdown=result-pipe.md \
    'hexyl /tmp/data'

hyperfine \
    --warmup 3 \
    --shell=none \
    --output=inherit \
    --export-markdown=result-inherit.md \
    'hexyl /tmp/data'

rm /tmp/data

Here, we use hyperfines new --output option to specify where to redirect the output of the benchmarked program. Using "pipe" is like doing hexyl … | cat > /dev/null (which is much different from doing hexyl … > /dev/null because hexyl doesn't directly write to /dev/null and kernel optimizations don't apply, i.e. write calls actually need to take place). And using "inherit" is like calling the program with the TTY attached (this will output a lot to your screen).

sharkdp commented 1 year ago

You can also create two binaries, hexyl-master and hexyl-feature, and then run something like:

hyperfine \
    --warmup 3 \
    --shell=none \
    --output=pipe \
    --parameter-list version master,feature
    --export-markdown=result-pipe.md \
    './hexyl-{version} /tmp/data'

to see the direct comparison.

sharifhsn commented 1 year ago
Command Mean [ms] Min [ms] Max [ms] Relative
hexyl data 421.3 ± 14.3 404.5 444.0 1.24 ± 0.11
hexyl --no-squeezing data 563.5 ± 14.7 542.3 583.4 1.65 ± 0.15
target/release/hexyl data 433.0 ± 16.0 409.1 462.5 1.27 ± 0.12
target/release/hexyl --no-squeezing data 526.2 ± 18.1 506.6 559.6 1.55 ± 0.14
target/release/hexyl --columns 4 data 367.4 ± 12.7 347.8 382.6 1.08 ± 0.10
target/release/hexyl --columns 4 --no-squeezing data 407.6 ± 14.2 375.5 426.5 1.20 ± 0.11
target/release/hexyl --columns 8 data 340.5 ± 28.6 306.6 399.2 1.00
target/release/hexyl --columns 8 --no-squeezing data 351.9 ± 15.3 318.9 366.2 1.03 ± 0.10

These are my results using 1 MB of /dev/urandom. I used my system-installed binary of hexyl to compare since I'm pretty sure there were no feature changes since the last release.

It seems as though for the default width of two columns the new version is slightly slower, but within noise for the benchmark. With more columns it's much faster, but that of course makes sense since it's writing fewer lines.

sharifhsn commented 1 year ago

One more thing; I've implemented columns here as a u8, since I find it unlikely that anyone would ever use over 255 columns in the viewer. This also places a cap on terminal width at 8969. Is this desirable, or should I change the type to u16?

sharifhsn commented 1 year ago

I've added the terminal width setting, so this should completely close #13.

sharkdp commented 1 year ago

One more thing; I've implemented columns here as a u8, since I find it unlikely that anyone would ever use over 255 columns in the viewer. This also places a cap on terminal width at 8969. Is this desirable, or should I change the type to u16?

I wanted to remark this in my first review but didn't want to be overly annoying :smile:. In my personal experience: trying to use size-limited (unsigned) integers (u8, u16, i8, i16) in places which are not size- or performance critical is almost always a mistake. It's usually meant as a (premature) optimization. But the reality is that it (1) leads to more errors due to overflows (2) leads to more complicated code due to more explicit casts between 8-bit, 16-bit and 32-bit numbers and (3) doesn't even necessarily generate faster code (because if they are not closely packed, we can't really make use of them) or smaller data structs (due to alignment mismatches).

So personally, for a single parameter, I would never mind trying to go for a small integer type and simply go with a 32-bit integer, or even isize/usize. Often, it's even undesirable to use unsigned integers for parameters that cannot (reasonably) be negative, because again, it might lead to more complicated code, and it can lead to tricky overflow errors.

sharkdp commented 1 year ago

Feel free to bump the min. supported Rust version if that causes problems.

sharkdp commented 1 year ago

Concerning the newly added options:

Would it be possible to scratch --auto-width and use something like --columns=max or --columns=auto instead?

Also, I think --terminal-width should be usable with other "width-setting" options. I think of it purely as an overwrite for the auto-detected terminal width. What do you think?

I've added the terminal width setting, so this should completely close #13.

Does it? I believe the issue also asks for custom widths that would not be multiples of 16. Like if you have some data structure with elements that are 14 byte long. Then you might want to specify --width=14.

sharifhsn commented 1 year ago

One more thing; I've implemented columns here as a u8, since I find it unlikely that anyone would ever use over 255 columns in the viewer. This also places a cap on terminal width at 8969. Is this desirable, or should I change the type to u16?

I wanted to remark this in my first review but didn't want to be overly annoying smile. In my personal experience: trying to use size-limited (unsigned) integers (u8, u16, i8, i16) in places which are not size- or performance critical is almost always a mistake. It's usually meant as a (premature) optimization. But the reality is that it (1) leads to more errors due to overflows (2) leads to more complicated code due to more explicit casts between 8-bit, 16-bit and 32-bit numbers and (3) doesn't even necessarily generate faster code (because if they are not closely packed, we can't really make use of them) or smaller data structs (due to alignment mismatches).

So personally, for a single parameter, I would never mind trying to go for a small integer type and simply go with a 32-bit integer, or even isize/usize. Often, it's even undesirable to use unsigned integers for parameters that cannot (reasonably) be negative, because again, it might lead to more complicated code, and it can lead to tricky overflow errors.

Thank you for saying this! I am quite new to Rust so I appreciate the advice. Making it a u16 eliminates a cast (since terminal_width uses a u16) so I'll use that instead. Using a bigger value would be more of a hassle than a u16.

sharifhsn commented 1 year ago

Concerning the newly added options:

Would it be possible to scratch --auto-width and use something like --columns=max or --columns=auto instead?

Of course.

Also, I think --terminal-width should be usable with other "width-setting" options. I think of it purely as an overwrite for the auto-detected terminal width. What do you think?

I was only going off of the original issue's suggestions. The use case for --terminal-width is probably in a situation where the width is necessarily limited for some reason. In that case, I would pick the minimum between the two options i.e. prefer --columns unless the number of columns exceeds that which is dictated by --terminal-width.

I've added the terminal width setting, so this should completely close #13.

Does it? I believe the issue also asks for custom widths that would not be multiples of 16. Like if you have some data structure with elements that are 14 byte long. Then you might want to specify --width=14.

Ah, I wasn't aware that was in the scope of the original issue. That would probably require a more significant rewrite.

sharkdp commented 1 year ago

Ah, I wasn't aware that was in the scope of the original issue. That would probably require a more significant rewrite.

That's okay. Let's focus on --columns for now.

sharifhsn commented 1 year ago

Ok, I think I misunderstood the initial issue. The custom number of columns refers to the individual columns, whereas my pull request focuses on customizing the number of panels. With that in mind, I think I should rename columns to panels and reflect that in the documentation.

A future change that customizes columns per panel would therefore be able to slot into this change better.

sharkdp commented 1 year ago

Another option would be to leave the option name as is, but change the meaning. And then for this PR, we only allow --columns=8,16,24,…. What do you think?

sharifhsn commented 1 year ago

Another option would be to leave the option name as is, but change the meaning. And then for this PR, we only allow --columns=8,16,24,…. What do you think?

I think it would be more ergonomic here to have the two options, columns-per-panel and number-of-panels.

In my mind, the idea behind variable number columns is to have the panel size itself adjust. With the other option, the panel size would always be 8 bytes, and a future ability to have e.g. columns=14 would leave blank space at the end.

It would be better to set columns=7 and panels=2 which is more explicit in my opinion.

sharkdp commented 1 year ago

You are right.

Let's change the command line option name then 👍🏼

sharifhsn commented 1 year ago

@sharkdp I think we can declare this PR complete? I don't see any other roadblocks.

sharkdp commented 1 year ago

I have a commit ready for a builder pattern. It uses a new PrinterBuilder struct, so it can technically be implemented in a non-breaking way, although that's probably undesirable. It's quite out of scope for this pull request, so I'll submit a new one once this one is approved.

That would be great! If we do that, I would make that a breaking change, i.e. remove the old new method.

@sharkdp I think we can declare this PR complete? I don't see any other roadblocks.

Yes. Looks great. Two things: the inline comment regarding the short options. And it would be great if you could add a CHANGELOG entry in the "unreleased" section. The format for entries is:

- Description what has been changed, see #123 (@user)

where #123 links to the bug ticket and/or the PR and user is your username.

sharkdp commented 1 year ago

Thank you!