Open asottile-sentry opened 8 months ago
Turns out macOS tar does not link against libzstd or whatever it's called, but instead shells out to zstd a bit like this:
$ zstd -q -dd < cpython-3.12.1+20240107-x86_64-apple-darwin-pgo+lto-full.tar.zst > /dev/null
Oddly enough running that command itself on the command line is fine. For some reason it sometimes fails when tar invokes it. Looking at the end of that file, there are about 9757 null bytes at the end. If I were to guess what is happening is that zstd decompresses that, but tar sees it as trash at the end it's no interested in reading. Then when it shuts down there might be some bytes left in the pipe that are not picked up which then causes zstd to fail with an error which then tar picks up on.
I have absolutely no clue what could be causing this.
If the zstd compression layer were buggy, I'd like to think someone would have filed against the zstandard Python package (which I maintain). We declare the length of the file input before compressing the and zstd C API should raise an error if we feed in data not exactly that length when the compression stream has ended. Similarly, zstd would fail checksum validation on decompression. So I don't think it is in the zstd layer.
Now, I do see the tar archive has 9757 null bytes at the end. That's a little fishy and I could easily see how a tar reader could get confused by that. Why sometimes is a big mystery though: I don't like non-deterministic software.
We're using the Python stdlib tarfile
module for creating that archive. Code at https://github.com/indygreg/python-build-standalone/blob/b9b7ac270272401a5c598e779105cbcf4235e7b5/pythonbuild/utils.py#L325. Other tar archives also seem to have large amounts of trailing NULLs. Not sure what makes this one special.
I think the non-determinism piece is key. That smells like faulty memory or some kind of software bug related to non-deterministic execution (such as ASLR - although I'm not sure if macOS exactly has ASLR).
I'm forced to treat this bug as non-actionable / low-priority unless multiple people can reproduce.
I can definitely reproduce this on two or my macs.
this is a perplexing one...
sometimes this fails:
doesn't seem to fail with
gtar
so at least I have a workaroundalso doesn't seem to fail for any of the other archives I've tried (admittedly I haven't tried that many)