First of all, I'm really not sure on which side this issue actually is nor if this is a "supported" scenario at all.
In systemd when we test everything with ASan/UBSan, we wrap certain uninstrumented binaries (that load instrumented stuff) with a simple shell script to make them work with ASan without having to resort to setting LD_PRELOAD=<path-to-libasan.so> globally, since that leads to all sorts of issues. The wrapper is pretty straightforward - we move the original binary, suffix it with ".orig", and then place a shell script at the original location. For example, for /bin/su we'd move it to /bin/su.orig and place this script at /bin/su:
#!/bin/bash
# Preload the ASan runtime DSO, otherwise ASAn will complain
export LD_PRELOAD="/usr/lib64/clang/15.0.4/lib/libclang_rt.asan-x86_64.so"
# Disable LSan to speed things up, since we don't care about leak reports
# from 'external' binaries
export ASAN_OPTIONS=detect_leaks=0
# Set argv[0] to the original binary name without the ".orig" suffix
exec -a "$0" -- "/bin/su.orig" "$@"
This works pretty well and allows us to test systemd with ASan/UBSan without having to rebuild the whole system.
However, a couple of days back, I noticed a strange fail in one of the newer tests in the ppc64le job:
[90437.173819] testsuite-58.sh[68]: ++ su testuser -s /bin/sh -c 'XDG_RUNTIME_DIR=/run/user/$UID exec "$@"' -- sh mktemp --directory /tmp/test-repart.XXXXXXXXXX
[90440.809111] testsuite-58.sh[69]: ERROR: ld.so: object '/usr/lib64/clang/14.0.0/lib/libclang_rt.asan-powerpc64le.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
[90440.809419] testsuite-58.sh[69]: /usr/bin/su: error while loading shared libraries: libpam.so.0: cannot open shared object file: Error 24
where su is the wrapper above, calling /bin/su.orig.
After several hours of playing around, it looks like pre-loading ASan on ppc64le in case of the su binary causes an infinite execve() loop - but only on ppc64le, on x86_64 it works as it should:
x86_64
# cat /bin/su
#!/bin/bash
# Preload the ASan runtime DSO, otherwise ASAn will complain
export LD_PRELOAD="/usr/lib64/clang/15.0.4/lib/libclang_rt.asan-x86_64.so"
# Disable LSan to speed things up, since we don't care about leak reports
# from 'external' binaries
export ASAN_OPTIONS=detect_leaks=0
# Set argv[0] to the original binary name without the ".orig" suffix
exec -a "$0" -- "/bin/su.orig" "$@"
# su.orig --version
su.orig from util-linux 2.38.1
# strace -e execve,openat -- su --help
execve("/usr/bin/su", ["su", "--help"], 0x7ffeeacbc840 /* 31 vars */) = 0
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libtinfo.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
...
openat(AT_FDCWD, "/usr/bin/su", O_RDONLY) = 3
execve("/bin/su.orig", ["/usr/bin/su", "--help"], 0x5616c4cc9ee0 /* 32 vars */) = 0
openat(AT_FDCWD, "/usr/lib64/clang/15.0.4/lib/libclang_rt.asan-x86_64.so", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libpam.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libpam_misc.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libaudit.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libeconf.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libcap-ng.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/proc/sys/kernel/cap_last_cap", O_RDONLY) = 3
openat(AT_FDCWD, "/proc/self/cmdline", O_RDONLY) = 4
openat(AT_FDCWD, "/proc/self/environ", O_RDONLY) = 4
...
openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/util-linux.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.UTF-8/LC_MESSAGES/util-linux.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/util-linux.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/util-linux.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
Usage:
su [options] [-] [<user> [<argument>...]]
...
However, if I drop the LD_PRELOAD=... line from the su script, everthing goes back to normal:
# cat /bin/su
#!/bin/bash
# Disable LSan to speed things up, since we don't care about leak reports
# from 'external' binaries
export ASAN_OPTIONS=detect_leaks=0
# Set argv[0] to the original binary name without the ".orig" suffix
exec -a "$0" -- "/bin/su.orig" "$@"
# strace -e execve,openat -- su --help
execve("/usr/bin/su", ["su", "--help"], 0x7fffee2d44b0 /* 46 vars */) = 0
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libtinfo.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/dev/tty", O_RDWR|O_NONBLOCK) = 3
...
openat(AT_FDCWD, "/usr/bin/su", O_RDONLY) = 3
execve("/bin/su.orig", ["/usr/bin/su", "--help"], 0x100127e45f0 /* 46 vars */) = 0
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libpam.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libpam_misc.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libaudit.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libeconf.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libcap-ng.so.0", O_RDONLY|O_CLOEXEC) = 3
...
openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/util-linux.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/util-linux.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
Usage:
su [options] [-] [<user> [<argument>...]]
...
After checking other binaries we do this with, they seem to be working fine-ish (i.e. no "infinite" loop), but the execve() calls are doubled:
ls (with wrapper, but without LD_PRELOAD=):
# strace -e execve -- ls -l ~
execve("/usr/bin/ls", ["ls", "-l", "/root"], 0x7fffcc26a798 /* 46 vars */) = 0
execve("/bin/ls.orig", ["/usr/bin/ls", "-l", "/root"], 0x1000ad545f0 /* 46 vars */) = 0
total 256
-rw-------. 1 root root 18565 Nov 21 12:18 anaconda-ks.cfg
-rw-r--r--. 1 root root 6 Nov 21 12:18 NETBOOT_METHOD.TXT
-rw-------. 1 root root 18582 Nov 21 12:23 original-ks.cfg
-rw-r--r--. 1 root root 9 Nov 21 12:18 RECIPE.TXT
+++ exited with 0 +++
ls (with wrapper and LD_PRELOAD=):
# strace -e execve -- ls -l ~
execve("/usr/bin/ls", ["ls", "-l", "/root"], 0x7fffdcb4f378 /* 46 vars */) = 0
execve("/bin/ls.orig", ["/usr/bin/ls", "-l", "/root"], 0x100092845f0 /* 47 vars */) = 0
execve("/bin/ls.orig", ["/usr/bin/ls", "-l", "/root"], 0x7fffead45308 /* 47 vars */) = 0
total 256
-rw-------. 1 root root 18565 Nov 21 12:18 anaconda-ks.cfg
-rw-r--r--. 1 root root 6 Nov 21 12:18 NETBOOT_METHOD.TXT
-rw-------. 1 root root 18582 Nov 21 12:23 original-ks.cfg
-rw-r--r--. 1 root root 9 Nov 21 12:18 RECIPE.TXT
+++ exited with 0 +++
Same with stat:
# Without LD_PRELOAD=
# strace -e execve -- stat ~
execve("/usr/bin/stat", ["stat", "/root"], 0x7fffdb495aa0 /* 46 vars */) = 0
execve("/bin/stat.orig", ["/usr/bin/stat", "/root"], 0x1003f7045f0 /* 46 vars */) = 0
# With LD_PRELOAD=
# strace -e execve -- stat ~
execve("/usr/bin/stat", ["stat", "/root"], 0x7ffff9d33f70 /* 46 vars */) = 0
execve("/bin/stat.orig", ["/usr/bin/stat", "/root"], 0x10004a145f0 /* 47 vars */) = 0
execve("/bin/stat.orig", ["/usr/bin/stat", "/root"], 0x7fffcd4e52b0 /* 47 vars */) = 0
etc. Not sure if that's expected, but at least the binaries seem to be working fine.
This was originally spotted on CentOS 8 Stream with compiler-rt-14.0.0-3.module_el8.7.0+1149+a59781f0.ppc64le, but it's reproducible on Fedora Rawhide with compiler-rt-15.0.4-1.fc38.ppc64le as well.
Hey!
First of all, I'm really not sure on which side this issue actually is nor if this is a "supported" scenario at all.
In systemd when we test everything with ASan/UBSan, we wrap certain uninstrumented binaries (that load instrumented stuff) with a simple shell script to make them work with ASan without having to resort to setting
LD_PRELOAD=<path-to-libasan.so>
globally, since that leads to all sorts of issues. The wrapper is pretty straightforward - we move the original binary, suffix it with ".orig", and then place a shell script at the original location. For example, for/bin/su
we'd move it to/bin/su.orig
and place this script at/bin/su
:This works pretty well and allows us to test systemd with ASan/UBSan without having to rebuild the whole system.
However, a couple of days back, I noticed a strange fail in one of the newer tests in the
ppc64le
job:where
su
is the wrapper above, calling/bin/su.orig
.After several hours of playing around, it looks like pre-loading ASan on
ppc64le
in case of thesu
binary causes an infiniteexecve()
loop - but only onppc64le
, onx86_64
it works as it should:x86_64
ppc64le
However, if I drop the
LD_PRELOAD=...
line from thesu
script, everthing goes back to normal:After checking other binaries we do this with, they seem to be working fine-ish (i.e. no "infinite" loop), but the
execve()
calls are doubled:ls
(with wrapper, but withoutLD_PRELOAD=
):ls
(with wrapper andLD_PRELOAD=
):Same with
stat
:etc. Not sure if that's expected, but at least the binaries seem to be working fine.
This was originally spotted on CentOS 8 Stream with
compiler-rt-14.0.0-3.module_el8.7.0+1149+a59781f0.ppc64le
, but it's reproducible on Fedora Rawhide withcompiler-rt-15.0.4-1.fc38.ppc64le
as well.