sharkdp / hexyl

A command-line hex viewer
Apache License 2.0
8.92k stars 227 forks source link

Themes, Styles and Case #123

Closed sivizius closed 3 years ago

sivizius commented 3 years ago

Features

Other

sivizius commented 3 years ago

Extended Binary Coded Decimal Interchange Code added. Test Coverage is now 91.75%

sharkdp commented 3 years ago

Thank you very much for your contribution.

I didn't have a detailed look, but these are a lot of changes. In general, I would be great to split this into several PRs. Better yet, possible changes should be discussed in an issue first, in order to avoid unnecessary work for you.

Now that everything is here already, we can also try to review it in total. But I need some help in making sense of all of this.

For example. Why did you introduce look up tables? It certainly doesn't make the code any easier. Looking at benchmark results, it also doesn't seem to make the code faster. Quite the contrary: this version of hexyl is about 4 times slower:

Command Mean [ms] Min [ms] Max [ms] Relative
./hexyl-master ./hexyl-master 249.5 ± 3.0 243.9 253.7 1.00
./hexyl-sivizius ./hexyl-master 1022.4 ± 6.7 1014.8 1036.4 4.10 ± 0.06

(benchmarked with hyperfine: hyperfine --warmup=5 './hexyl-master ./hexyl-master' './hexyl-sivizius ./hexyl-master' --export-markdown hexyl.md)

sivizius commented 3 years ago

The initial reason for these commits was to introduce formats, e.g. for ELF, PNG, and themes (colour-schemes, but maybe more).

In the master-version, the lookup-tables were generated at runtime at lib.rs:172 with the output-sequence (ansi-sequence + character/hexadecimal value) for each input-byte. This is fine for ASCII, but with EBCDIC, the characters had to be remapped to ASCII and to keep both ascii.rs and ebcdic.rs similar, I used a constant lookup-table there too. There is a unit-test with the old generator-code to check the correctness of the ascii-lookup-table, but I somehow forgot to check the lookup-tables for the hexadecimal values.

These lookup-table could in theory boost the performance for small inputs (but it does not matter there) and is insignificant faster for big inputs (because the initialisation-phase is short in comparison to the runtime), but performance was not the reason I introduced constant lookup-tables instead of generated.

To support context-sensitive formats like ELF (which was actually the main reason I started to work on this project), where bytes have different colours depending on context, I introduced this an abstraction-layer: Instead of byte→sequence, bytes are mapped to categories and the bytes are coloured depending on category and theme. Because one could only one theme each run, the theme, which is a structure, gets converted to an lookup-table category→ansi_term::Style.

Now the performance-issue: At the moment the paint_byte-method gets called for every byte and not just 256-times in the initialisation-phase. This of course make this slower for inputs bigger than 256 bytes, but I am already working on it:

For primitive formats like ASCII and EBCDIC, the old lookup-tables for each byte could be generated using the constant lookup-table for the characters and categories with the lookup-table returned by applying the to_colors-method on the used theme. An optimisation I see for ELF is to paint every byte for every category or at least every used category, beforehand and use a two-dimensional lookup-table or a one-dimensional by with index (category<<8+byte).

Another optimisation I am working on is

Anyway: Yes, I know, this is not optimal yet, but this is WIP. I implemented these features in parallel and they are somewhat depended on each other, and it was actually easier just to implement it that to open an issue first, but if you like to have the discussion on the feature-part (issue) and the implementation-of-the-feature-part (here) separate, we could open an issue.

sharkdp commented 3 years ago

Yes, I think it would be best to discuss each feature first. In general, I would like to keep hexyl simple and focused on it's current task. That being said, if there is a large support for a particular proposed feature, I'm happy to consider adding it.

I'm closing this PR for now, but happy to reopen later if it should become relevant.

cryptoquick commented 3 years ago

Gotta say, theme support would be really cool, it was the first thing I looked up after using it. I want a rainbow theme!