landley / toybox

toybox
http://landley.net/toybox
BSD Zero Clause License
2.44k stars 340 forks source link

xxd -p -c 0 #452

Closed dlegaultbbry closed 1 year ago

dlegaultbbry commented 1 year ago

Seems that that this should just print the value as hex in a single long line but I'm unable to get this working (hash.bin = empty string sha256 digest value).

cat hash.bin | toybox xxd -p -c 0 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852 b855

Looking at the code, it seems like this line is the culprit: https://github.com/landley/toybox/blob/master/toys/other/xxd.c#L85

landley commented 1 year ago

$ echo -n | sha256sum | toybox xxd -p -c 0 | wc 1 1 137

And the output starts with 6533 not e3b0...

How do I reproduce this?

dlegaultbbry commented 1 year ago

Alternate method which works on my OSX machine (xxd is not toybox's version here)

echo -n "" | openssl dgst -sha256 -binary | xxd -p -c 0 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

Unfortunately sha256sum generates hex output which is not what is desired here

enh-google commented 1 year ago

yeah, "works for me" on linux too:

/tmp/toybox$ echo -n | sha256sum | ./toybox xxd -p -c 0 
6533623063343432393866633163313439616662663463383939366662393234323761653431653436343962393334636134393539393162373835326238353520202d0a
/tmp/toybox$ echo -n | sha256sum | xxd -p -c 0 
6533623063343432393866633163313439616662663463383939366662393234323761653431653436343962393334636134393539393162373835326238353520202d0a
/tmp/toybox$ 

i think you're using openssl wrong? you need the -r flag to get output like the sha256sum command:

/tmp/toybox$ echo -n "" | openssl dgst -sha256 -r | xxd -p -c 0
65336230633434323938666331633134396166626634633839393666623932343237616534316534363439623933346361343935393931623738353262383535202a737464696e0a
/tmp/toybox$ 

without that i get the same (wrong) answer as you:

/tmp/toybox$ echo -n "" | openssl dgst -sha256 -binary | xxd -p -c 0
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
dlegaultbbry commented 1 year ago

It's not the wrong answer, look at the openssl sha256 empty string test vector here: https://en.wikipedia.org/wiki/SHA-2#Test_vectors

The gist is that I want the sha256("") as binary which is then converted to hex by xxd.

In any case, if toybox seems to work on linux, then I'll chase what is going wrong on QNX.

enh-google commented 1 year ago

you're confusing binary and ascii:

~$ python3
Python 3.11.4 (main, Jun  7 2023, 10:13:09) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hashlib
>>> m = hashlib.sha256()
>>> m.update(b"")
>>> m.digest()
b"\xe3\xb0\xc4B\x98\xfc\x1c\x14\x9a\xfb\xf4\xc8\x99o\xb9$'\xaeA\xe4d\x9b\x93L\xa4\x95\x99\x1bxR\xb8U"
>>> 
~$ echo -n | sha256sum
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855  -
~$ 
landley commented 1 year ago

Ah, you have to un-hex the output to get the stuff you want to hex dump...

$ echo -n | sha256sum | toybox xxd -r xxd: -: seek failed: Invalid seek

Sigh. Todo item thrown on heap.

ANYWAY: the point is, the objection was the wordwrap in xxd -p -c0 and we're not seeing it. The line you pointed at prints a newline as do_xxd() exits, meaning it's done with an input source. I.E. that line puts a newline between each input source listed on the command line. That shouldn't be able to break a line in the middle because the function doesn't continue to process the rest of the input after that line, so it can only emit more output after that for a new file...

I suppose there could be FILE * output buffer flush shenanigans? The output of putchar() being emitted before the output of printf()?

enh-google commented 1 year ago

Sigh. Todo item thrown on heap.

i think that's "correct" for values of "correct" that include "toybox's error messages tend toward the inscrutable"...

~$ echo -n | sha256sum | xxd -r
xxd: Sorry, cannot seek backwards.
~$ 

(it's just confused by the - for "my input was stdin" at the end of the sha256sum output. i don't think there's a way to just get the hash, other than piping through awk or cut or whatever.)

dlegaultbbry commented 1 year ago

I'll figure out the word wrap issue and let you know once I deep dive into where it goes wrong. Could be some QNXism doing funky things which wouldn't be a first.

@enh-google this line looks mighty funky after a few bytes b"\xe3\xb0\xc4B\x98\xfc\x1c\x14\x9a\xfb\xf4\xc8\x99o\xb9$'\xaeA\xe4d\x9b\x93L\xa4\x95\x99\x1bxR\xb8U" compared to expected e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 It looks ok at the beginning and then falls apart with more characters per 'byte'. Maybe formatting got the best of it somehow.

enh-google commented 1 year ago

@enh-google this line looks mighty funky after a few bytes

python prints byte arrays as an escaped C string, and some of the bytes in that hash are printable ascii.

landley commented 1 year ago

I understood the message, but I thought xxd -r run with no arguments should operate on stdin, so I thought it was complaining it couldn't operate on nonseekable input? I didn't know it could take random bits of its input as "go open /dev/watchdog". (Not a regular xxd user...)

I added -b (brief) to toybox md5/sha*sum so you CAN get just the hash. Alas I have not gone through the "push this to everyone else" dance yet, in part because the cut -DF dance is... https://lists.gnu.org/archive/html/coreutils/2023-08/msg00009.html might still be addressed by https://lists.gnu.org/archive/html/coreutils/2023-08/msg00060.html but honestly who knows?

landley commented 1 year ago

And testing it:

$ echo -n | toybox sha256sum -b | toybox xxd -r xxd: -: seek failed: Invalid seek $ echo -n | toybox sha256sum -b e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

No trailing dash, still saying invalid seek...

landley commented 1 year ago

I miss "there should be one obvious way to do it". The move from python 2 to python 3 largely drove me out of python because it all just got gratuitously complicated and broke backwards compatibility for no reason. There was a lovely linux weekly news article comparing the python 3 adoption to the kubler ross stages of grief... and google says "site:lwn.net python kubler ross" has zero hits. Luckily I linked to it from my blog so can still find it: https://lwn.net/Articles/669768/

enh-google commented 1 year ago

And testing it:

$ echo -n | toybox sha256sum -b | toybox xxd -r xxd: -: seek failed: Invalid seek $ echo -n | toybox sha256sum -b e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

No trailing dash, still saying invalid seek...

you need -p with -r for -p-style input.

$ echo -n | toybox sha256sum -b | xxd -r -p | xxd
00000000: e3b0 c442 98fc 1c14 9afb f4c8 996f b924  ...B.........o.$
00000010: 27ae 41e4 649b 934c a495 991b 7852 b855  '.A.d..L....xR.U
$ echo -n | toybox sha256sum -b | toybox xxd -r -p | xxd
00000000: e3b0 c442 98fc 1c14 9afb f4c8 996f b924  ...B.........o.$
00000010: 27ae 41e4 649b 934c a495 991b 7852 b855  '.A.d..L....xR.U
~/aosp-main-with-phones/bionic$ 

i don't know why "real" xxd doesn't give a better error message, but toybox's doesn't because i made my usual fscanf() mistake:

    if (!FLAG(p) && fscanf(fp, "%llx: ", &pos) == 1) {

does not check that we actually parsed ": ". replace the : with a %c and check for == 2 && ch == ':' and complain about bad input format otherwise? (let me know if you'd like me to send the patch; normally i would, but it seems like you're actively poking about atm...)

enh-google commented 1 year ago

I miss "there should be one obvious way to do it".

yeah, to be clear --- i don't love xxd, and i actively hate its "decades of accreted cruft" ui that means anything unusual (like using -p) means i have to re-read the man page, but afaik none of the other hexdumps have -r. and given how useful -r is, xxd is sadly the only hexdump that could be my "one stop shop". (except even there, od is still sometimes useful because -t lets you do some interesting stuff. and although i've never used hexdump(1)'s even greater formatting control, others do, including in the AOSP build.)

i think my "dream" hexdump would be hexdump(1) with -C as the default (so i didn't need the hd symlink) and xxd's -r and od's -t functionality [and i guess hexdump(1)'s -e for the people who use that]. but i feel like the last thing we need is yet another hex dump! (plan 9's https://9p.io/magic/man2html/1/xd was the obvious step forward from od, but otherwise quite disappointing.)

landley commented 1 year ago

Happy to take a patch to this one, since it's not my baby and I'm less likely to notice breaking stuff. :)

landley commented 1 year ago

I have a partial hexdump.c as ranted in https://landley.net/notes-2023.html#17-07-2023 and I've 90% convinced myself to just do it... modulo I've never used the advanced hexdump formatting features either, I always used it as -C before finding the hd short name. (The first big program I wrote in commodore 64 basic was a hex editor, showing stuff in a format similar to hd.)

Alas I've had too many tabs open recently and done a bit of swap thrashing, trying to close and check them in. One of them is of course the Linux From Scratch build under toybox (so not small), and another is running the test suite under mkroot (so VERY not small), but neither of those should BLOCK the other stuff...

enh-google commented 1 year ago

Happy to take a patch to this one, since it's not my baby and I'm less likely to notice breaking stuff. :)

ack. i'll look at that tonight.

I have a partial hexdump.c as ranted in https://landley.net/notes-2023.html#17-07-2023 and I've 90% convinced myself to just do it... modulo I've never used the advanced hexdump formatting features either, I always used it as -C before finding the hd short name. (The first big program I wrote in commodore 64 basic was a hex editor, showing stuff in a format similar to hd.)

yeah, i saw that post and agree with the "meh, the deduplication savings are probably less than the abstraction overhead" general feeling (along with knowing what you mean about it being annoying).

interestingly, my notes say we needed -n#, -s#, -C, -e, -f FILE, and -v for hexdump ... but code search shows no users in AOSP. i can see some internal uses, with -C -n 128, -s 128 -n 8 -e 'blah %d', and -v -e '1/4 "%08x"' -e '"\n"' being representative examples of the kinds of things being done. ("representative" in the sense of "if you can do those, you can do everything".)

enh-google commented 1 year ago

(http://lists.landley.net/pipermail/toybox-landley.net/2023-August/029697.html adds a clear error message for users who make this mistake.)