sharkdp / hexyl

A command-line hex viewer
Apache License 2.0
8.92k stars 227 forks source link

Support raw output #219

Open jess-sol opened 3 months ago

jess-sol commented 3 months ago

Not sure if this makes sense for Hexyl to support, but because it already has -s and -c, it'd be really nice to be able to output raw binary data to trim down a large binary to a small one quickly; after locating the portion wanted. If there's interest in a --raw flag, I'd be happy to implement it.

sharkdp commented 2 months ago

That sounds like an interesting idea! I think it would be a nice feature to have, if it can be cleanly integrated into the code base.

We should maybe also research what other tools do (hexdump, xxd). I think they provide options to turn their output back into binary? which might be even more powerful. Because you can build pipelines and integrate other tools as well.

jess-sol commented 2 months ago

So I did a bit more digging on how other tools do it.

With xxd, it supports it by the -r option:

-r | -revert
  Reverse operation: convert (or patch) hex dump into binary.  If not writing to stdout, xxd writes into its output file without truncating it. Use the combination -r -p to read plain hexadecimal
  dumps  without  line number information and without a particular column layout. Additional whitespace and line breaks are allowed anywhere. Use the combination -r -b to read a bits dump instead
  of a hex dump.

Basically the expectation is that you feed the output of a previous xxd command into xxd -r to revert back to the original file. Formatting options given to the first invocation must be given to the second, for example:

xxd input | xxd -r
xxd -b input | xxd -br

Hexdump on the other hand has a very powerful output formatting syntax. It provides a way to split output into groups of bytes, and consume/format some number of groups. This is how you'd output the raw binary:

hexdump input -ve '1/1 "%c"' # Number of groups / Number of bytes
hexdump input -ve '"%c"' # 1/1 elided

Hexdump output formatting is powerful (see some examples in a Suse blog post), though it seems like it'd be more practical to write a python script than a hexdump format file these days for most uses.

I think having a --raw output would be the simplest codewise. It'd just require bailing out early and copying the reader to stdout after skip/take. I could see an argument for adding a custom output format option similar to hexdump's (though ideally simplified). In that case it'd make sense to collect some usecases that Hexyl would want to support and design towards that.