Closed dlegaultbbry closed 1 year ago
$ echo -n | sha256sum | toybox xxd -p -c 0 | wc 1 1 137
And the output starts with 6533 not e3b0...
How do I reproduce this?
Alternate method which works on my OSX machine (xxd is not toybox's version here)
echo -n "" | openssl dgst -sha256 -binary | xxd -p -c 0 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Unfortunately sha256sum generates hex output which is not what is desired here
yeah, "works for me" on linux too:
/tmp/toybox$ echo -n | sha256sum | ./toybox xxd -p -c 0
6533623063343432393866633163313439616662663463383939366662393234323761653431653436343962393334636134393539393162373835326238353520202d0a
/tmp/toybox$ echo -n | sha256sum | xxd -p -c 0
6533623063343432393866633163313439616662663463383939366662393234323761653431653436343962393334636134393539393162373835326238353520202d0a
/tmp/toybox$
i think you're using openssl wrong? you need the -r flag to get output like the sha256sum command:
/tmp/toybox$ echo -n "" | openssl dgst -sha256 -r | xxd -p -c 0
65336230633434323938666331633134396166626634633839393666623932343237616534316534363439623933346361343935393931623738353262383535202a737464696e0a
/tmp/toybox$
without that i get the same (wrong) answer as you:
/tmp/toybox$ echo -n "" | openssl dgst -sha256 -binary | xxd -p -c 0
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
It's not the wrong answer, look at the openssl sha256 empty string test vector here: https://en.wikipedia.org/wiki/SHA-2#Test_vectors
The gist is that I want the sha256("") as binary which is then converted to hex by xxd.
In any case, if toybox seems to work on linux, then I'll chase what is going wrong on QNX.
you're confusing binary and ascii:
~$ python3
Python 3.11.4 (main, Jun 7 2023, 10:13:09) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hashlib
>>> m = hashlib.sha256()
>>> m.update(b"")
>>> m.digest()
b"\xe3\xb0\xc4B\x98\xfc\x1c\x14\x9a\xfb\xf4\xc8\x99o\xb9$'\xaeA\xe4d\x9b\x93L\xa4\x95\x99\x1bxR\xb8U"
>>>
~$ echo -n | sha256sum
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 -
~$
Ah, you have to un-hex the output to get the stuff you want to hex dump...
$ echo -n | sha256sum | toybox xxd -r xxd: -: seek failed: Invalid seek
Sigh. Todo item thrown on heap.
ANYWAY: the point is, the objection was the wordwrap in xxd -p -c0 and we're not seeing it. The line you pointed at prints a newline as do_xxd() exits, meaning it's done with an input source. I.E. that line puts a newline between each input source listed on the command line. That shouldn't be able to break a line in the middle because the function doesn't continue to process the rest of the input after that line, so it can only emit more output after that for a new file...
I suppose there could be FILE * output buffer flush shenanigans? The output of putchar() being emitted before the output of printf()?
Sigh. Todo item thrown on heap.
i think that's "correct" for values of "correct" that include "toybox's error messages tend toward the inscrutable"...
~$ echo -n | sha256sum | xxd -r
xxd: Sorry, cannot seek backwards.
~$
(it's just confused by the -
for "my input was stdin" at the end of the sha256sum output. i don't think there's a way to just get the hash, other than piping through awk or cut or whatever.)
I'll figure out the word wrap issue and let you know once I deep dive into where it goes wrong. Could be some QNXism doing funky things which wouldn't be a first.
@enh-google this line looks mighty funky after a few bytes b"\xe3\xb0\xc4B\x98\xfc\x1c\x14\x9a\xfb\xf4\xc8\x99o\xb9$'\xaeA\xe4d\x9b\x93L\xa4\x95\x99\x1bxR\xb8U" compared to expected e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 It looks ok at the beginning and then falls apart with more characters per 'byte'. Maybe formatting got the best of it somehow.
@enh-google this line looks mighty funky after a few bytes
python prints byte arrays as an escaped C string, and some of the bytes in that hash are printable ascii.
I understood the message, but I thought xxd -r run with no arguments should operate on stdin, so I thought it was complaining it couldn't operate on nonseekable input? I didn't know it could take random bits of its input as "go open /dev/watchdog". (Not a regular xxd user...)
I added -b (brief) to toybox md5/sha*sum so you CAN get just the hash. Alas I have not gone through the "push this to everyone else" dance yet, in part because the cut -DF dance is... https://lists.gnu.org/archive/html/coreutils/2023-08/msg00009.html might still be addressed by https://lists.gnu.org/archive/html/coreutils/2023-08/msg00060.html but honestly who knows?
And testing it:
$ echo -n | toybox sha256sum -b | toybox xxd -r xxd: -: seek failed: Invalid seek $ echo -n | toybox sha256sum -b e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
No trailing dash, still saying invalid seek...
I miss "there should be one obvious way to do it". The move from python 2 to python 3 largely drove me out of python because it all just got gratuitously complicated and broke backwards compatibility for no reason. There was a lovely linux weekly news article comparing the python 3 adoption to the kubler ross stages of grief... and google says "site:lwn.net python kubler ross" has zero hits. Luckily I linked to it from my blog so can still find it: https://lwn.net/Articles/669768/
And testing it:
$ echo -n | toybox sha256sum -b | toybox xxd -r xxd: -: seek failed: Invalid seek $ echo -n | toybox sha256sum -b e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
No trailing dash, still saying invalid seek...
you need -p
with -r
for -p
-style input.
$ echo -n | toybox sha256sum -b | xxd -r -p | xxd
00000000: e3b0 c442 98fc 1c14 9afb f4c8 996f b924 ...B.........o.$
00000010: 27ae 41e4 649b 934c a495 991b 7852 b855 '.A.d..L....xR.U
$ echo -n | toybox sha256sum -b | toybox xxd -r -p | xxd
00000000: e3b0 c442 98fc 1c14 9afb f4c8 996f b924 ...B.........o.$
00000010: 27ae 41e4 649b 934c a495 991b 7852 b855 '.A.d..L....xR.U
~/aosp-main-with-phones/bionic$
i don't know why "real" xxd doesn't give a better error message, but toybox's doesn't because i made my usual fscanf() mistake:
if (!FLAG(p) && fscanf(fp, "%llx: ", &pos) == 1) {
does not check that we actually parsed ": "
. replace the :
with a %c
and check for == 2 && ch == ':'
and complain about bad input format otherwise? (let me know if you'd like me to send the patch; normally i would, but it seems like you're actively poking about atm...)
I miss "there should be one obvious way to do it".
yeah, to be clear --- i don't love xxd, and i actively hate its "decades of accreted cruft" ui that means anything unusual (like using -p) means i have to re-read the man page, but afaik none of the other hexdumps have -r. and given how useful -r is, xxd is sadly the only hexdump that could be my "one stop shop". (except even there, od is still sometimes useful because -t lets you do some interesting stuff. and although i've never used hexdump(1)'s even greater formatting control, others do, including in the AOSP build.)
i think my "dream" hexdump would be hexdump(1) with -C as the default (so i didn't need the hd symlink) and xxd's -r and od's -t functionality [and i guess hexdump(1)'s -e for the people who use that]. but i feel like the last thing we need is yet another hex dump! (plan 9's https://9p.io/magic/man2html/1/xd was the obvious step forward from od, but otherwise quite disappointing.)
Happy to take a patch to this one, since it's not my baby and I'm less likely to notice breaking stuff. :)
I have a partial hexdump.c as ranted in https://landley.net/notes-2023.html#17-07-2023 and I've 90% convinced myself to just do it... modulo I've never used the advanced hexdump formatting features either, I always used it as -C before finding the hd short name. (The first big program I wrote in commodore 64 basic was a hex editor, showing stuff in a format similar to hd.)
Alas I've had too many tabs open recently and done a bit of swap thrashing, trying to close and check them in. One of them is of course the Linux From Scratch build under toybox (so not small), and another is running the test suite under mkroot (so VERY not small), but neither of those should BLOCK the other stuff...
Happy to take a patch to this one, since it's not my baby and I'm less likely to notice breaking stuff. :)
ack. i'll look at that tonight.
I have a partial hexdump.c as ranted in https://landley.net/notes-2023.html#17-07-2023 and I've 90% convinced myself to just do it... modulo I've never used the advanced hexdump formatting features either, I always used it as -C before finding the hd short name. (The first big program I wrote in commodore 64 basic was a hex editor, showing stuff in a format similar to hd.)
yeah, i saw that post and agree with the "meh, the deduplication savings are probably less than the abstraction overhead" general feeling (along with knowing what you mean about it being annoying).
interestingly, my notes say we needed -n#, -s#, -C, -e, -f FILE, and -v for hexdump ... but code search shows no users in AOSP. i can see some internal uses, with -C -n 128
, -s 128 -n 8 -e 'blah %d'
, and -v -e '1/4 "%08x"' -e '"\n"'
being representative examples of the kinds of things being done. ("representative" in the sense of "if you can do those, you can do everything".)
(http://lists.landley.net/pipermail/toybox-landley.net/2023-August/029697.html adds a clear error message for users who make this mistake.)
Seems that that this should just print the value as hex in a single long line but I'm unable to get this working (hash.bin = empty string sha256 digest value).
cat hash.bin | toybox xxd -p -c 0 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852 b855
Looking at the code, it seems like this line is the culprit: https://github.com/landley/toybox/blob/master/toys/other/xxd.c#L85