uutils / coreutils

Cross-platform Rust rewrite of the GNU coreutils
https://uutils.github.io/
MIT License
17.73k stars 1.27k forks source link

cksum: gets confused by base64 that happens to consist entirely of hexadecimal digits #6572

Open BenWiederhake opened 3 months ago

BenWiederhake commented 3 months ago

The Base64 alphabet has, as the name suggests, 64 letters. 22 of these letters look like hexadecimal digits. That means that a random string of 8 Base64 letters (which encodes 6 bytes = 48 bits) has a chance of (22/64)^8 ~= 2^-12.3 to be a valid hexadecimal string. This means that generating a hash with an output length of 24 bits or a multiple thereof (e.g. SHA384 or Blake2b-48) might generate two different hexadecimal-looking hashes (of different lengths). This can cause all kinds of shenanigans with cksum, which has to detect/guess the encoding from the sums-file.

In particular, here is a case where it goes wrong:

$ echo -n esq > foo.dat # The bytestring b"esq" is very special
$ cksum --algo=blake2b --length=48 --base64 foo.dat | tee foo.sums # Because the base64 *looks* like it's hexadecimal!
BLAKE2b-48 (foo.dat) = fc1f97C4
$ cksum --check foo.sums # GNU cksum takes no issue with this.
foo.dat: OK
$ cargo run -q cksum --check foo.sums # But uutils gets confused by this.
foo.dat: FAILED
cksum: WARNING: 1 computed checksum did NOT match
[$? = 1]
$ cargo run -q hashsum --b2sum --bits 48 --check foo.sums # hashsum *also* gets confused by this.
foo.dat: FAILED
hashsum: WARNING: 1 computed checksum did NOT match
[$? = 1]

There are probably more bugs like this.

Note that this is not specific to blake2b: With SHA384, it would probably require around 2^99 attempts to find a file that hashes to a digest that triggers this bug. For reference, the Bitcoin mining community computes about 2^60 hashes per second according to some sketchy website, which is good enough for this thought experiment. So it would require about 17734 years to find that file. Okay, nevermind, this bug doesn't realistically affect SHA384. (But theoretically it does.)

Found while reading #6500 (probably unrelated though).

CC @sylvestre, because you seem to be interested in this kind of bugs.

tertsdiepraam commented 3 months ago

Great find! Is this a GNU issue or just for our implementation?

BenWiederhake commented 3 months ago

I'm not entirely sure what you mean? The GNU behavior seems self-consistent, we differ from GNU behavior, and aren't self-consistent. So I'd say that this is a bug in uutils.