sharkdp / hexyl

A command-line hex viewer
Apache License 2.0
8.93k stars 227 forks source link

Beat xxd #66

Closed mike239x closed 4 years ago

mike239x commented 5 years ago

I did a bit of benchmarking and I can't help but notice that xxd is faster than hexyl. On my machine on a file of about 700M:

$ time xxd myfile > /dev/null

real    0m43.245s
user    0m42.950s
sys 0m0.272s

$ time hexyl --color=never --no-squeezing --border=none myfile > /dev/null

real    1m10.967s
user    1m1.371s
sys 0m9.592s

It would be nice to beat xxd in speed... I got no idea how to do it though.

sharkdp commented 5 years ago

Thank you for the feedback.

I agree, it would be nice. But not more. I don't really see a problem with the current speed, as I don't think that performance (at the current level) is critical for a hexdump tool. hexyl processes around 10 MiB of data per second. It outputs text to the screen at a speed that is much faster than terminal emulators can handle (in terminator, hexyl is a factor of 5 slower when I write to the TTY).

In which real-world use case would we really need it to be faster?

mike239x commented 4 years ago

I tried to find a "real-world use case" but failed. I would say it is an ideological thing... Something in lines "software shouldn't get slower with time, but faster".

I'll take a look into the source code in my free time, maybe (though unlikely) I'll find the way to improve it :)

sharkdp commented 4 years ago

"software shouldn't get slower with time, but faster"

I would agree. But hexyl is about adding additional functionality (the colorized output). It's not trying to be a 1:1 replacement for xxd.

remexre commented 4 years ago

Real-world use case -- I've got a pretty big file that's mostly zeroes, with a k or so of nonzero data. Reading from /tmp, hexyl takes 55.791s, hexdump -C takes 1.091s, xxd >/dev/null takes 40.528s.

sharkdp commented 4 years ago

@remexre Thank you.

If someone wants to work on this, here is a reproducible benchmark (I'm using hyperfine):

#!/bin/bash

dd if=/dev/zero    bs=10M count=1 >  data
dd if=/dev/urandom bs=1k  count=1 >> data

hyperfine --warmup 3 \
    'hexyl data' \
    'hexyl --no-squeezing data' \
    'hexdump -C data' \
    'hexdump -C --no-squeezing data' \
    'xxd data' \
    --export-markdown results.md
Command Mean [s] Min [s] Max [s] Relative
hexyl data 1.037 ± 0.023 1.014 1.078 63.1
hexyl --no-squeezing data 1.289 ± 0.022 1.261 1.319 78.4
hexdump -C data 0.016 ± 0.001 0.016 0.018 1.0
hexdump -C --no-squeezing data 1.921 ± 0.014 1.902 1.943 116.8
xxd data 0.707 ± 0.008 0.701 0.729 43.0

Apparently, hexdumps "squeezing" mode is really good.

sharkdp commented 4 years ago

see #73

fmillion commented 2 years ago

Real-world use case -- I've got a pretty big file that's mostly zeroes, with a k or so of nonzero data. Reading from /tmp, hexyl takes 55.791s, hexdump -C takes 1.091s, xxd >/dev/null takes 40.528s.

Old commit, but here's another use case for posterity. I want to compare two disk images, and I want to not only see where data differs, but also what the differing data is, in a hexdump format. To do that, I like using tools like this to produce a plaintext version of the data that can then be diffed. Storing the huge files isn't an issue (either diff can be piped directly, or the huge files can be stored on a compressed filesystem).