Closed lilyball closed 5 years ago
For this task you may want my termrec — it has tools to record and replay a terminal stream with timing information. There's a library with two parts: for storing/rewinding/etc the raw or frame-by-frame stream, and libtty to keep the state of the terminal. You could instrument the latter to get performance data.
Another option is to use hyperfine
s --show-output
flag. It will simply loop through all the output of the benchmarked commands instead of piping to /dev/null
.
Without a TTY:
Command | Mean [ms] | Min…Max [ms] |
---|---|---|
hexyl $(which hexyl) |
173.2 ± 5.4 | 166.9…186.1 |
hexdump -C $(which hexyl) |
194.3 ± 6.4 | 188.0…215.4 |
xxd $(which hexyl) |
74.0 ± 2.1 | 70.9…83.4 |
With alacritty
:
Command | Mean [ms] | Min…Max [ms] |
---|---|---|
hexyl $(which hexyl) |
510.9 ± 16.3 | 493.5…544.9 |
hexdump -C $(which hexyl) |
374.7 ± 25.0 | 347.7…431.4 |
xxd $(which hexyl) |
227.5 ± 14.1 | 205.8…244.8 |
With terminator
:
Command | Mean [s] | Min…Max [s] |
---|---|---|
hexyl $(which hexyl) |
1.730 ± 0.047 | 1.659…1.807 |
hexdump -C $(which hexyl) |
0.632 ± 0.019 | 0.598…0.661 |
xxd $(which hexyl) |
0.465 ± 0.024 | 0.427…0.502 |
We can observe:
hexyl
s output (relatively speaking), as expected - due to the colors.alacritty
is freaking fast :smile: Specifically, I'm thinking about how we turn colors on and off again for every single hex pair and textual character. Ideally we wouldn't turn colors off if the next printed hex/char uses the same color.
Optimizing our color usages would be more overhead on our side and therefore slow down our benchmark (though perhaps not significantly) but if it produces faster rendering it might be worth it.
I wouldn't really bother doing this. I don't think that the current performance is problematic in any way. A more interesting benchmark could be to measure the execution time for rather small files. There might be a startup latency for hexyl
due to its larger binary size, as is typical for Rust programs.
At the very least, we could investigate not printing the style suffix for each character, under the assumption that the style prefix for the next character will suffice (and then just printing the suffix prior to printing a frame character).
Yes, we could probably do this to also save on the bandwidth.
Don't get me wrong. I love fast programs. I just don't think that hexyl
really has a performance problem. Specifically after your recent PR which made it several times faster. I'm never going to output binary blobs of 1MB or larger to the terminal. And if I am, I don't really care if it takes the hexyl
500 ms to print the 60,000 lines of output to the terminal.
I on the other hand very often run hd|less
on very large files (seeking to an interesting part, of course). With hexyl, this would be less -R
goodness. So this request isn't completely without point.
Valid point, but the usage of a pager will help you with the rendering speed because only the current page has to be printed.
What might be interesting to measure is something like hexyl | less -R
and then immediately trying to view the final page.
What might be interesting to measure is something like
hexyl | less -R
and then immediately trying to view the final page.
My hope would be that this would be pretty much the time that we get without a TTY.
hexyl $(which hexyl) | less -R
and subsequent Shit+G is definitely much faster for me than waiting for hexyl $(which hexyl)
to be finished.
Is there anything more we want to do here or can this be closed?
Personally, I am still interested in the performance when just printing directly to Terminal.app. I'd like to do some investigation of this on my own and see if there are some easy wins, so if you don't mind I want to keep the ticket open for at least a little while.
In a quick test, removing the suffixes and inserting reset sequences before the frame chars results in an approximately 14% slowdown on the benchmark, but a 30% speedup when actually rendering to Terminal.app.
What might be interesting to measure is something like
hexyl | less -R
and then immediately trying to view the final page.
Wouldn't that be equivalent to something like hexyl | tail -n <height of terminal>
if you don't want to take overhead from less into account?
@kitlith tail -n 30
would give you the last 30 unwrapped lines of output. less
performs wrapping. That said, less
seems to be pretty smart about jumping to end given how fast it can do it, so it's clearly not calculating wrap points for any non-displayed lines.
Well, less skips to the end then explicitly says "Calculating line numbers..." while you already see the final screen.
Point is, less calculating line numbers or doing line-wrapping isn't the focus of this issue? It's the rendering performance. less (shift-G) is not as benchmarkable as just showing the last few lines of a dump with something like tail, or copying the output and displaying it to the screen directly.
Rendering performance is mostly about how fast the terminal emulator state machine can process the escape codes and text. less
isn't a great measure here because its line number calculation is hard-wrapped lines (and therefore just needs to scan for newlines rather than running the full state machine for all non-displayed lines), but given that piping to less
is expected to be a common use-case it's possibly more important than the time it takes for the terminal to render the actual full output of Hexyl.
Is the question just about the trade-off between optimizing for one case at the cost of the other?
@lilyball I don't know how involved the changes you made were (in terms of LOC) but perhaps you can just gate them based off of whether or not hexyl is outputting directly to a tty?
No, it's about adding some complexity to optimize a case that some dismiss as unimportant. More code = maintenance cost.
On the other hand, performance cost of comparing a few variables is so small that I'd guess even shaving some work from printf-equivalent and sending the data via pipe would already be a win — much less going into rendering in the terminal.
I'm going to close this. If anybody feels that hexyl
is (still) too slow when writing to a terminal, please let me know.
We've been benchmarking the performance of the tool without considering the rendering performance of the terminal. Specifically, I'm thinking about how we turn colors on and off again for every single hex pair and textual character. Ideally we wouldn't turn colors off if the next printed hex/char uses the same color.
I'm not really sure how to programmatically measure the terminal performance (and of course performance would change for different terminals), but it's worth at least trying to measure. Optimizing our color usages would be more overhead on our side and therefore slow down our benchmark (though perhaps not significantly) but if it produces faster rendering it might be worth it.
At the very least, we could investigate not printing the style suffix for each character, under the assumption that the style prefix for the next character will suffice (and then just printing the suffix prior to printing a frame character).