Other sizes of data (group size and Endianness)

ACleverDisguise commented 3 years ago

I frequently have to dump data files (ADC output, for example) that don't just have byte-oriented data. It would be nice to be able to specify data width in the dump so I get the hex data grouped in the natural data size instead of having to do the little-endian two-step and mentally group indistinguishable bytes by 2 or 4 or whatever. Something like:

--word-size=1 _(uint8t, default) --word-size=2 _(uint16t) --word-size=4 _(uint32t) --word-size=8 _(uint64t) --word-size=16 _(uint128t)

That covers the common-ish types. If you want to be really brave you could do weird crap like 3-byte or 17 byte, but that is likely low return on investment.

Not all such data is little-endian, so an extra flag for those cases where word-size > 1 would be:

--little-endian (default) --big-endian

Also, interpretation could be signed or unsigned

--signed --unsigned (default)

Of course with this you'd drop the byte-oriented colouration (but maybe with --signed you'd highlight negative numbers in red or something).

sharkdp commented 3 years ago

Thank you for the feedback.

It's not entirely clear to me what the output would look like.

Say I choose --word-size=2 (uint16_t) and the input contains 0xAB 0xCD 0x12 0x34. Would you like to see

CDAB 3412

for --little-endian and

ABCD 1234

for --big-endian?

ACleverDisguise commented 3 years ago

That's pretty much exactly what I was picturing, yes.

sharkdp commented 3 years ago

This looks similar to xxds -groupsize option if I am not mistaking:

       -g bytes | -groupsize bytes
              Separate the output of every <bytes> bytes (two hex characters or  eight
              bit-digits  each)  by  a whitespace.  Specify -g 0 to suppress grouping.
              <Bytes> defaults to 2 in normal mode, 4 in little-endian mode and  1  in
              bits mode.  Grouping does not apply to postscript or include style.

I recently came across this when reading this blog post which makes use of -g to inspect ELF64 executables.

ACleverDisguise commented 3 years ago

It is similar to -g and -e in xxd, yes, but I'm not a huge fan of their nomenclature and their rather bizarre default assumptions. (Like the bizarre assumption that "normal" is big-endian, which hasn't been "normal" for decades now.) I can understand, perhaps, that you might want to keep it compatible for easier transition for users, though, so I'm only going to express a mild preference for breaking free from it.

sharkdp commented 1 year ago

@RinHizakura If you find the time, could you maybe summarize what is and what is not possible with your new option in #170? (released today)

RinHizakura commented 1 year ago

The new option --group-bytes will provide the functionality to group multiple octets as a unit, which means that several bytes will be shown together without whitespace. It is quite similar to the option -groupsize in xxd, however, the possible group size should only be 1, 2, 4, or 8 currently.

On the other hand, this could only be shown in the big-endian format. The little-endian dump is not supported now.

sharkdp commented 1 year ago

The new option --group-bytes will provide the functionality to group multiple octets as a unit, which means that several bytes will be shown together without whitespace. It is quite similar to the option -groupsize in xxd, however, the possible group size should only be 1, 2, 4, or 8 currently.

I think this limitation fine for now. 16 would probably be nice, but I understand that it probably interferes with --panels.

On the other hand, this could only be shown in the big-endian format. The little-endian dump is not supported now.

Right. I agree with @ACleverDisguise that this would be a really nice feature to have. So let's keep this ticket open for now.

sharkdp commented 1 year ago

I think the main functionality requested in this ticket is now supported with #189 by @RinHizakura now also merged.

sharkdp / hexyl

Other sizes of data (group size and Endianness) #104