Closed sharifhsn closed 6 months ago
Here's a visual image of the difference.
Wow! That was fast :smile:
Thank you so much.
I've directly copy-pasted the codepage table from the repository in the blog post you linked. I'm not sure how to properly give credit for it.
I think I saw a MIT license in the repository earlier today? If so, I would just copy the license text verbatim and paste it into a comment preceding the table. And add a reference to the original source file + author, if not present in the license text. Maybe let's also describe the manual modifications we did in that comment.
Look how cool a .gif file looks with this feature enabled :smile:
Nice! I’m glad this idea is catching on.
Note that ␀ is already a printable character, since it’s U+2400 SYMBOL FOR NULL rather than U+0000 NULL. In fact, the whole range U+2400 through U+241F are printable representations of the C0 controls U+0000 through U+001F, though I chose to use the cp437 dingbats for the others. ⋄ probably makes more sense for hexyl though :)
FFh is also visually indistinguishable from 20h in this table, so for the Rewind path interpreter, we used U+FB00 LATIN SMALL LIGATURE FF (ff) for FFh.
Nice! I’m glad this idea is catching on.
Thank you very much for commenting here and for the original idea! A while ago, I built binocle, which uses a similar idea in a graphical application (distinctly colored pixels for each byte), but I never thought about translating this idea to the character panel of hexyl
.
I'm now thinking that it might also be cool to have a mode in hexyl
where we have distinct colors for every byte (not just for different categories). We could then print █ (U+2588) for every byte in the character table, but colorize it differently using ANSI escape sequences. As for this PR, we could maybe already add a --character-table block
(or similar) variant where we simply print █ for each byte. With the category-colorization, it would look like this for the GIF example above:
Note that ␀ is already a printable character, since it’s U+2400 SYMBOL FOR NULL rather than U+0000 NULL. In fact, the whole range U+2400 through U+241F are printable representations of the C0 controls U+0000 through U+001F, though I chose to use the cp437 dingbats for the others.
We use (some of) those U+2400 … U+241F symbols in bat --show-all
. I like the idea. But at least with my terminal emulator / font, they are usually really hard to read. Even at large font sizes, I can barely make out the "NUL" characters. This is why we chose ⋄
for hexyl
. @sharifhsn but we should update the comment for consistency: "modified to use the '⋄' character instead of '␀'"
FFh is also visually indistinguishable from 20h in this table, so for the Rewind path interpreter, we used U+FB00 LATIN SMALL LIGATURE FF (ff) for FFh.
@sharifhsn I think we should integrate the same adaptation here, if you agree?
@sharifhsn but we should update the comment for consistency: "modified to use the '⋄' character instead of '␀'"
Makes sense.
@sharifhsn I think we should integrate the same adaptation here, if you agree?
Ditto.
I'm now thinking that it might also be cool to have a mode in hexyl where we have distinct colors for every byte (not just for different categories). We could then print █ (U+2588) for every byte in the character table, but colorize it differently using ANSI escape sequences. As for this PR, we could maybe already add a --character-table block (or similar) variant where we simply print █ for each byte.
Great idea! I've implemented this using the xterm
colors as there are 256 distinct colors, which maps perfectly to the number of distinct bytes. I'm open to a change in color scheme. Here's how it looks now:
This is so cool.
Especially when combined with --border none
, --no-squeezing
, potentially --group-size 8
and --panels 8
on a wide screen:
But I would appreciated if we could focus this PR on --character-table
and finish that first. Add a few basic tests. Make sure that it plays nicely with other features like e.g. squeezing. For example, I noticed that the squeezing character gets replaced by a block when --character-table=block
. And maybe we should quickly check if this doesn't introduce any performance regression for the default case.
We can then explore the colorization part in a separate PR.
I added a test for code page 437, and I've fixed the issue with the asterisk, which was a leftover bug from my original implementation anyway 😅
I've also refactored the color components into a new module, since they're taking up a lot of visual space and have separated concerns anyway.
With regards to performance, preliminary benchmarks show that performance is within noise for the default case and for code page 437, and significantly slower with block coloring (due to the fact that the color changes every single byte).
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
target/release/hexyl --character-table codepage-437 data |
2.246 ± 0.087 | 2.169 | 2.442 | 1.00 |
target/release/hexyl --character-table block data |
4.412 ± 0.120 | 4.229 | 4.567 | 1.96 ± 0.09 |
target/release/hexyl data |
2.490 ± 0.357 | 2.195 | 3.234 | 1.11 ± 0.16 |
hexyl data |
2.274 ± 0.142 | 2.071 | 2.579 | 1.01 ± 0.07 |
With regards to colorization and the block table, it's all the same machinery. I would prefer to push this all in one PR unless there's some outstanding issue with the block table that can't be resolved quickly.
I've also refactored the color components into a new module, since they're taking up a lot of visual space and have separated concerns anyway.
Thank you for the updates. I would still prefer to move the colorization part to a new PR.
@sharkdp The colorization of characters has been factored out. This PR should be ready to merge.
Implements #194
Additional comments: