indygreg / python-build-standalone

Produce redistributable builds of Python
Mozilla Public License 2.0
1.97k stars 125 forks source link

bsd tar is *sometimes* unhappy with `cpython-3.12.1+20240107-x86_64-apple-darwin-pgo+lto-full.tar.zst` #208

Open asottile-sentry opened 8 months ago

asottile-sentry commented 8 months ago

this is a perplexing one...

$ wget https://github.com/indygreg/python-build-standalone/releases/download/20240107/cpython-3.12.1+20240107-x86_64-apple-darwin-pgo+lto-full.tar.zst
$ sha256sum cpython-3.12.1+20240107-x86_64-apple-darwin-pgo+lto-full.tar.zst 
bf2b176b0426d7b4d4909c1b19bbb25b4893f9ebdc61e32df144df2b10dcc800  cpython-3.12.1+20240107-x86_64-apple-darwin-pgo+lto-full.tar.zst

sometimes this fails:

$ rm -rf pythons && mkdir -p pythons/cp312-cp312 && tar -C pythons/cp312-cp312 --strip-components=2 -xvf cpython-3.12.1+20240107-x86_64-apple-darwin-pgo+lto-full.tar.zst python/install > log 2>&1 || (tail -5 log)
$ rm -rf pythons && mkdir -p pythons/cp312-cp312 && tar -C pythons/cp312-cp312 --strip-components=2 -xvf cpython-3.12.1+20240107-x86_64-apple-darwin-pgo+lto-full.tar.zst python/install > log 2>&1 || (tail -5 log)
x lib/tk8.6/xmfbox.tcl
x share/man/man1/python3.1
x share/man/man1/python3.12.1
tar: Child process exited with status 1
tar: Error exit delayed from previous errors.

doesn't seem to fail with gtar so at least I have a workaround

also doesn't seem to fail for any of the other archives I've tried (admittedly I haven't tried that many)

$ sw_vers 
ProductName:            macOS
ProductVersion:         13.5.2
BuildVersion:           22G91
$ tar --version
bsdtar 3.5.3 - libarchive 3.5.3 zlib/1.2.11 liblzma/5.0.5 bz2lib/1.0.8 
$ zstd --version
*** Zstandard CLI (64-bit) v1.5.5, by Yann Collet ***
$ uname -a
Darwin FJJ4YYCWYX.local 22.6.0 Darwin Kernel Version 22.6.0: Wed Jul  5 22:22:05 PDT 2023; root:xnu-8796.141.3~6/RELEASE_ARM64_T6000 arm64 arm Darwin
mitsuhiko commented 7 months ago

Turns out macOS tar does not link against libzstd or whatever it's called, but instead shells out to zstd a bit like this:

$ zstd -q -dd < cpython-3.12.1+20240107-x86_64-apple-darwin-pgo+lto-full.tar.zst > /dev/null

https://github.com/libarchive/libarchive/blob/2039275a708e1371b0529954c60638722f9613b0/libarchive/archive_read_support_filter_zstd.c#L147

Oddly enough running that command itself on the command line is fine. For some reason it sometimes fails when tar invokes it. Looking at the end of that file, there are about 9757 null bytes at the end. If I were to guess what is happening is that zstd decompresses that, but tar sees it as trash at the end it's no interested in reading. Then when it shuts down there might be some bytes left in the pipe that are not picked up which then causes zstd to fail with an error which then tar picks up on.

indygreg commented 7 months ago

I have absolutely no clue what could be causing this.

If the zstd compression layer were buggy, I'd like to think someone would have filed against the zstandard Python package (which I maintain). We declare the length of the file input before compressing the and zstd C API should raise an error if we feed in data not exactly that length when the compression stream has ended. Similarly, zstd would fail checksum validation on decompression. So I don't think it is in the zstd layer.

Now, I do see the tar archive has 9757 null bytes at the end. That's a little fishy and I could easily see how a tar reader could get confused by that. Why sometimes is a big mystery though: I don't like non-deterministic software.

We're using the Python stdlib tarfile module for creating that archive. Code at https://github.com/indygreg/python-build-standalone/blob/b9b7ac270272401a5c598e779105cbcf4235e7b5/pythonbuild/utils.py#L325. Other tar archives also seem to have large amounts of trailing NULLs. Not sure what makes this one special.

I think the non-determinism piece is key. That smells like faulty memory or some kind of software bug related to non-deterministic execution (such as ASLR - although I'm not sure if macOS exactly has ASLR).

I'm forced to treat this bug as non-actionable / low-priority unless multiple people can reproduce.

mitsuhiko commented 7 months ago

I can definitely reproduce this on two or my macs.