sharkdp / binocle

a graphical tool to visualize binary data
Apache License 2.0
1.12k stars 32 forks source link

Entropy: improve performance by keeping colorgradient static. #32

Closed siedentop closed 3 years ago

siedentop commented 3 years ago

Thanks for the repo!

I ran cargo flamegraph on the code when using the Entropy visualization. This showed one very obvious area of improvement. csscolor::parse::parse was parsing hex numbers as part of creating the colograde::magma() color gradient.

I think the easiest solution is just to make colorgrade::magma() a static and re-use it. What do you think?

Aside: Is there any work in making the UI into a separate thread from the calculations? The mouse-over is just sluggish when I change the color scheme.

Thanks for reviewing!

siedentop commented 3 years ago

Update: Faster UI can be found in this PR: https://github.com/sharkdp/binocle/pull/34

PING: @sharkdp

sharkdp commented 3 years ago

Hi Christoph, thank you for the feedback!

I ran cargo flamegraph on the code when using the Entropy visualization. This showed one very obvious area of improvement. csscolor::parse::parse was parsing hex numbers as part of creating the colograde::magma() color gradient.

I think the easiest solution is just to make colorgrade::magma() a static and re-use it. What do you think?

Maybe we could be even faster by computing (and caching) the actual colors for a set of (discretized) entropy values? We do something very similar in the ColorGradient style:

https://github.com/sharkdp/binocle/blob/4f995ef1f541c1c64a9f069604a4bbb71614e28c/src/style.rs#L77-L81

Aside: Is there any work in making the UI into a separate thread from the calculations? The mouse-over is just sluggish when I change the color scheme.

Not yet, no. I knew I was asking for trouble though :smile:. Running rendering and UI in separate threads is definitely the way to go.

sharkdp commented 3 years ago

By the way: there is a ticket for slow entropy computation here: #7. I have a few ideas on how to speed it up, but most of this comes at the cost of accuracy (like computing only one entropy value per N byte chunk).

siedentop commented 3 years ago

Maybe we could be even faster by computing (and caching) the actual colors for a set of (discretized) entropy values? We do something very similar in the ColorGradient style

Ah yes! I'll try that instead. That should be better. Although, performance wise it won't make as much of a difference (because with the current changes, all the compute is happening in the entropy calculation, I believe). But it is the cleaner approach (and probably just a tad faster).


Not yet, no. I knew I was asking for trouble though 😄. Running rendering and UI in separate threads is definitely the way to go.

PR #34 addresses this. While I think it makes the logic a bit more complicated and isn't in the spirit of "immediate mode" in egui, it is the approach suggested by the Pixel library (get_frame method):

Get a mutable byte slice for the pixel buffer. The buffer is not cleared for you; it will retain the previous frame's contents until you clear it yourself.

In consequence, I would add multiple threads on-top of the "only paint if changed". But just using separate threads would also be appropriate, IMO.

[I will cross-post this on #34.]

siedentop commented 3 years ago

I have implemented your suggestion of caching the discretized entropy value.

I was not able to set up a benchmark (without a massive overhaul of the crate), and did not measure performance. Both implementations feel the same to me.

I prefer the original version, since the logic is simpler. But feel free to decide which version you prefer and choose that one.

siedentop commented 3 years ago

Some performance measurements with tracing.

All data is with the test file as supplied and Zoom set to 1x. And all other settings left at default. Of course, Entropy was selected. This is done on an M1 Macbook Air connected to a 5K monitor.

NOTE: First measurements done running in debug 🤦🏼‍♂️. Left for posterity, but should not be considered.

I am looking at the time of the Binocle::draw annotated function. The RedrawRequested part will always take 16ms (for a consistent frame rate). There is virtually no jitter.

Version Commit Debug @ 1x Release @ 1x Release @ 4x
master 4f995ef1f541c1c64a9f069604a4bbb71614e28c 219 ms 9.2 ms 60 ms (!)
Static Gradient 8ecfe64f716d6062e5769bdbd8f20f1ba8fa39e8 160 ms 6.7 ms 13 ms
Discretized + Cached Color 6624c037ba1192bf18626cd1c2bd6ec7b605de1b 155 ms 6.6 ms 11.8 ms

This is only possible due to the tracing support: https://github.com/sharkdp/binocle/pull/35

sharkdp commented 3 years ago

Thank you for the benchmark results and the update. Still need to figure out why it's slower when zoomed in.