landley / toybox

toybox
http://landley.net/toybox
BSD Zero Clause License
2.41k stars 335 forks source link

tar --sort=name #408

Closed enh-google closed 1 year ago

enh-google commented 1 year ago

(copy & pasted from internal-only bug http://b/265836099.)

The Android kernel build system now uses tar from toybox (aosp/2151397). For reproducible builds, tar needs to traverse files and directories in a sorted order.

https://reproducible-builds.org/docs/archives/ suggests two ways to do this:

For GNU tar, use the --sort argument. For non-GNU tars, use find ... | sort ... | tar ....

However, for the Android kernel build, #2 is not a complete solution. Both the wrapper build system (build.sh / Kleaf) and the internal build system (Kbuild) calls tar commands. We can fix all cases in build.sh / Kleaf, but we can't fix all those in Kbuild because they are from upstream Linux tree. Even if we fix them now, we can't prevent future upstream patches to break this, unless we circulate the message throughout the Linux community and ask all reviewers to reject patches that calls tar in a non-reproducible way.

Hence, I want to ask for support of --sort for the toybox tar. Then, we can inject default arguments like this:

aosp/2360346

... that is, I can replace the tar in PATH to be a shell script that calls the real tar binary with some pre-defined arguments.

landley commented 1 year ago

Ack. Taking a look...

landley commented 1 year ago

Sigh, https://android-review.googlesource.com/c/kernel/build/+/2389693 is because there's no xpclose_both(TT.pid, 0) in the exit path, so we're not waiting on the gzip child process to exit before exiting ourselves. (And if gzip exits with error code we probably should to...)

landley commented 1 year ago

My tar tree is dirty with --wildcard changes and I'm trying to find a graceful way to either merge or park them...

landley commented 1 year ago

re: the hermetic wrapper script: you guys seem to be going to surprising lengths to reinvent "alias". Is there something I'm missing?

Would you like me to tweak the binary in some way to help with this? I note that https://github.com/landley/toybox/blob/master/toys/example/logpath.c#L59 is already doing a "run next instance of this command out of the $PATH" thing, and I could presumably genericize it if you really needed some sort of

next tar "$@" --potato --ocelot

That skips the first instance in the $PATH...

enh-google commented 1 year ago

Sigh, https://android-review.googlesource.com/c/kernel/build/+/2389693 is because there's no xpclose_both(TT.pid, 0) in the exit path, so we're not waiting on the gzip child process to exit before exiting ourselves. (And if gzip exits with error code we probably should to...)

oh, nice ... i'd left filing a bug about that for today, since i had no useful repro steps (and although it had to be something along those lines, i'm told that the problem goes away with strace -f).

enh-google commented 1 year ago

re: the hermetic wrapper script: you guys seem to be going to surprising lengths to reinvent "alias". Is there something I'm missing?

iirc that was my initial suggestion, but they wanted to solve the problem once and for all, no matter how many other scripts might also have the problem (without having to remember to source their aliases from every script).

landley commented 1 year ago

On 1/18/23 09:41, enh-google wrote:

re: the hermetic wrapper script: you guys seem to be going to surprising
lengths to reinvent "alias". Is there something I'm missing?

iirc that was my initial suggestion, but they wanted to solve the problem once and for all, no matter how many other scripts might also have the problem (without having to remember to |source| their aliases from every script).

Extra calls to dirname and readlink aside, hardwiring ../../../../.. into a path in a solution makes my teeth hurt. (As a symlink, it's fighting filesystem with filesystem. Putting it in a shell script, not so much.)

The reason I haven't already done it is the logical way to do a wrapper script is something like:

!/bin/bash --norc

exec next command --modifiers "$@"

Run with PATH="$WRAPDIR:$PATH" but that "exec" zaps the "which file did we run out of the $PATH last time" info so next can't trace through $PATH to find this instance so it can reliably run the one after it. Which means we're just abritrarily skipping the first instance in the $PATH assuming that was you, so you can't wrap the same command twice.

If you don't exec I could teach next to check /proc/$PPID/exe but then you're accumulating gratuitous wrapper processes for the lifetime of the thing you've wrapped? Eh, not fatal, just icky...

Oh, duh, I ALREADY SOLVED this in my record-commands plumbing: the wrapper modifies $PATH for the child to chop off the initial $WRAPDIR: segment. You can have as many levels of $WRAPDIR on the front as you like. (Modulo if any of them get skipped things get weird, because if you add a new $WRAPDIR: that doesn't wrap this command, then the one in the second layer strips the first layer off and then calls itself again, so it terminates but the --modifiers stutter.)

Hmmm. "Solved" is kinda strong there. In record-commands' case it would just log the command twice which is harmless-ish, and when I've got a C binary that's changing the command line arguments, we don't have the extra exec handoff so it can find itself in the $PATH via /proc/self/exe.

(Teaching next to recognize a shell script where the first nonblank noncomment line is "exec next command" would be ../../.. levels of magic brittle. Creating a directory with /system/next/command text files to have the new command line we rewrite to feels overengineered and loses shell syntax evaluation flexibility...)

You know that whole dunning-kruger U-curve where you get the wrong answer at the start and NO answer at the end? I may have passed the sweet spot here...

No atime doesn't help.

There's no way #!/path/to/next ends well for at least 3 reasons.

Ungh, the shell script COULD be:

/bin/mksh

WHODIS="$(readlink /proc/self/exe)" exec next command --thingy "$@"

I'm not happy about it (extra readlink call on top of the extra sh call), but it would solve the problem? If mksh has a --norc equivalent, I can't find it in their man page, but it's not WORSE than what your guys were already doing?

Rob

enh-google commented 1 year ago

rearranging the tree is already such a disruptive exercise that although all that ../../ stuff is ugly, i can't think of many occasions where it's caused trouble. (and they've all been for configuration files rather than commands --- a broken symlink to a command fails quite nicely. broken symlinks to configuration files sometimes gets interpreted as "just the defaults then?".)

landley commented 1 year ago

As for the actual topic of this, I'm trying to implement the DIRTREE_BREADTH logic so it can do breadth first traversal, allowing a per-directory sort callback to be inserted at the right place before descending into the children.

Kind of DIRTREE_SAVE on a per-directory level, without populating the whole tree up front and losing the dirtree_parent() file descriptors so I'd have to start potentially worrying about PATH_MAX again. populate directory, sort, process-and-descend, then free saved directory contents when ascending back out of it...

landley commented 1 year ago

Try 9ba775e805bd

enh-google commented 1 year ago

github CI is red with two test failures:

tar: had errors
FAIL: tar replace dir with file
echo -ne '' | "/home/runner/work/toybox/toybox/generated/testdir/tar" -xf test.tar && cat one/two/three
--- expected    2023-01-30 00:01:50.655051718 +0000
+++ actual  2023-01-30 00:01:50.659051827 +0000
@@ -1 +1 @@
-hello
+exited with signal (or returned 137)

and

double free or corruption (out)
FAIL: tar tsort
echo -ne '' | $TAR -c sub2 --sort=name | tar t
--- expected    2023-01-30 00:01:50.867057489 +0000
+++ actual  2023-01-30 00:01:50.875057707 +0000
@@ -1,4 +1 @@
-sub2/
-sub2/djelibeybi
-sub2/ephebe
-sub2/klatch
+��Ae�U/

https://github.com/landley/toybox/actions/runs/4039282702/jobs/6943861265

macOS only has the latter tar failure, but it also has a tail failure that i think is unrelated flake we've seen before:

FAIL: tail -F
echo -ne '' | tail -s .1 -F walrus 2>/dev/null & sleep .2; echo hello > walrus;
sleep .2; truncate -s 0 walrus; sleep .2; echo potato >> walrus; sleep .2;
echo hello >> walrus; sleep .2; rm walrus; sleep .2; echo done > walrus;
  sleep .5; kill %1
--- expected    2023-01-30 00:03:39.000000000 +0000
+++ actual  2023-01-30 00:03:40.000000000 +0000
@@ -1,4 +1,3 @@
 hello
 potato
-hello
 done
landley commented 1 year ago

On 1/30/23 09:32, enh-google wrote:

github CI is red with two test failures:

|tar: had errors FAIL: tar replace dir with file echo -ne '' | "/home/runner/work/toybox/toybox/generated/testdir/tar" -xf test.tar && cat one/two/three --- expected 2023-01-30 00:01:50.655051718 +0000 +++ actual 2023-01-30 00:01:50.659051827 +0000 @@ -1 +1 @@ -hello +exited with signal (or returned 137) |

Alas ndk-25b is still missing --static support for ASAN ("attempted static link of dynamic object libclang_rt.asan-x86_64-android.so"), which is why I've been using the crappy gcc one (which this passed).

Once upon a time I was trying to get "mkroot.sh dynamic" to harvest the shared libraries out of the toolchain, I should circle back to that...

and

|double free or corruption (out)

I blogged about how the lifetime rules were a pain to get right with a three stage recursion cycle doing handoffs.

I'll try to reproduce it here...

macOS only has the latter tar failure, but it also has a tail failure that i think is unrelated flake we've seen before:

I remember going over that, but don't remember what specifically the result was? Alas, I just convinced dreamhost to fix lists.landley.net's robots.txt and the google cache hasn't repopulated enough for web archive search to find anything yet, and thunderbird's built-in search is sufficiently terrible that running grep on the mbox files is usually better...

Oh right, -f and -F are different: http://lists.landley.net/pipermail/toybox-landley.net/2021-June/028559.html

And this is the mac kernel not honoring/propagating O_APPEND to readers, which conceptually can't be fixed in userspace.

I suspect that test needs some kind of "not reliable on macos" annotation, which kind of implies the test suite infrastructure could use an upgrade so we're not spelling out "Darwin" each time? And there's a whole PILE of tests that don't pass on FreeBSD which I need to go through and annotate or fix now I've got a VM for that, but I'm trying to close accumulated tabs and flush dirty trees...

(You'd think having fewer competing demands on my time would mean I was catching up instead of just falling behind more slowly. You can sing "scope creep" to that toot sweets song from chitty chitty bang bang...)

Anyway: yeah, tail -F flaking on mac in the "truncate" case is a known issue.

Rob

landley commented 1 year ago

It does help to check in the lib/dirtree.c changes that go along with the commit...

enh-google commented 1 year ago

On Tue, Jan 31, 2023 at 12:38 AM Rob Landley @.***> wrote:

On 1/30/23 09:32, enh-google wrote:

github CI is red with two test failures:

|tar: had errors FAIL: tar replace dir with file echo -ne '' | "/home/runner/work/toybox/toybox/generated/testdir/tar" -xf test.tar && cat one/two/three --- expected 2023-01-30 00:01:50.655051718 +0000 +++ actual 2023-01-30 00:01:50.659051827 +0000 @@ -1 +1 @@ -hello +exited with signal (or returned 137) |

Alas ndk-25b is still missing --static support for ASAN ("attempted static link of dynamic object libclang_rt.asan-x86_64-android.so"), which is why I've been using the crappy gcc one (which this passed).

you know you can just install clang on debian? they're usually a bit behind the times, but not too bad, and still probably better than using gcc?

Once upon a time I was trying to get "mkroot.sh dynamic" to harvest the shared libraries out of the toolchain, I should circle back to that...

and

|double free or corruption (out)

I blogged about how the lifetime rules were a pain to get right with a three stage recursion cycle doing handoffs.

I'll try to reproduce it here...

macOS only has the latter tar failure, but it also has a tail failure that i think is unrelated flake we've seen before:

I remember going over that, but don't remember what specifically the result was? Alas, I just convinced dreamhost to fix lists.landley.net's robots.txt and the google cache hasn't repopulated enough for web archive search to find anything yet, and thunderbird's built-in search is sufficiently terrible that running grep on the mbox files is usually better...

Oh right, -f and -F are different: http://lists.landley.net/pipermail/toybox-landley.net/2021-June/028559.html

And this is the mac kernel not honoring/propagating O_APPEND to readers, which conceptually can't be fixed in userspace.

I suspect that test needs some kind of "not reliable on macos" annotation, which kind of implies the test suite infrastructure could use an upgrade so we're not spelling out "Darwin" each time? And there's a whole PILE of tests that don't pass on FreeBSD which I need to go through and annotate or fix now I've got a VM for that, but I'm trying to close accumulated tabs and flush dirty trees...

(You'd think having fewer competing demands on my time would mean I was catching up instead of just falling behind more slowly. You can sing "scope creep" to that toot sweets song from chitty chitty bang bang...)

Anyway: yeah, tail -F flaking on mac in the "truncate" case is a known issue.

Rob

— Reply to this email directly, view it on GitHub https://github.com/landley/toybox/issues/408#issuecomment-1409965276, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMVLEWBORYPW6EABYJPU5CLWVDFRHANCNFSM6AAAAAAT6RD4EI . You are receiving this because you authored the thread.Message ID: @.***>

enh-google commented 1 year ago

that seems to have made github's CI happy, but see the change i just sent you for a fix for the _FORTIFY_SOURCE build error Jarno mentioned yesterday...

On Tue, Jan 31, 2023 at 1:56 AM Rob Landley @.***> wrote:

It does help to check in the lib/dirtree.c changes that go along with the commit...

— Reply to this email directly, view it on GitHub https://github.com/landley/toybox/issues/408#issuecomment-1410066684, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMVLEWBDJUGUNH65NJD35BDWVDOUPANCNFSM6AAAAAAT6RD4EI . You are receiving this because you authored the thread.Message ID: @.***>

landley commented 1 year ago

Answer to this got long so I posted it to the list instead of here.

Let me know if there's more I need to do on this issue...

enh-google commented 1 year ago

thanks. i'll close this for now, get the prebuilts updated and see what the kernel folks say. i'm guessing anything they say will make sense as smaller bug reports on the mailing list anyway. (i only sent this here because it was a more sizable feature request.)