martinpitt / umockdev

Mock hardware devices for creating unit tests and bug reporting
https://launchpad.net/umockdev
GNU Lesser General Public License v2.1
308 stars 55 forks source link

All tests hang on alpha #200

Open matoro opened 1 year ago

matoro commented 1 year ago

Hi, in both of the two extant distros supporting alpha (Debian & Gentoo) all tests in the test suite timeout. I tried increasing the meson timeout past the 150-second limit but the tests hung indefinitely.

Test log from my run (Gentoo):

>>> Test phase: dev-util/umockdev-0.17.13
 * .alpha: running multilib-minimal_abi_src_test
meson test --print-errorlogs -C /var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13-.alpha --num-processes 1 --no-suite fails-valgrind
ninja: Entering directory `/var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13-.alpha'
ninja: no work to do.
umockdev / umockdev time out (After 150 seconds)
1/4 umockdev / umockdev     TIMEOUT        150.10s   killed by signal 15 SIGTERM
>>> GI_TYPELIB_PATH=/var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13-.alpha MALLOC_CHECK_=3 MALLOC_PERTURB_=218 G_DEBUG=fatal-warnings,fatal-criticals,gc-friendly LD_LIBRARY_PATH=/var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13-.alpha PATH=/var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13-.alpha:/var/tmp/portage/dev-util/umockdev-0.17.13/temp/python3.11/bin:/usr/lib/portage/python3.10/ebuild-helpers/xattr:/usr/lib/portage/python3.10/ebuild-helpers:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin TOP_SRCDIR=/var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13 /var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13/src/umockdev-wrapper /var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13-.alpha/test-umockdev

umockdev / ioctl-tree time out (After 150 seconds)
2/4 umockdev / ioctl-tree   TIMEOUT        150.07s   killed by signal 15 SIGTERM
>>> GI_TYPELIB_PATH=/var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13-.alpha MALLOC_CHECK_=3 G_DEBUG=fatal-warnings,fatal-criticals,gc-friendly MALLOC_PERTURB_=47 LD_LIBRARY_PATH=/var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13-.alpha PATH=/var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13-.alpha:/var/tmp/portage/dev-util/umockdev-0.17.13/temp/python3.11/bin:/usr/lib/portage/python3.10/ebuild-helpers/xattr:/usr/lib/portage/python3.10/ebuild-helpers:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin TOP_SRCDIR=/var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13 /var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13/src/umockdev-wrapper /var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13-.alpha/test-ioctl-tree

umockdev / umockdev-run time out (After 150 seconds)
3/4 umockdev / umockdev-run TIMEOUT        150.06s   killed by signal 15 SIGTERM
>>> GI_TYPELIB_PATH=/var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13-.alpha MALLOC_CHECK_=3 G_DEBUG=fatal-warnings,fatal-criticals,gc-friendly MALLOC_PERTURB_=130 LD_LIBRARY_PATH=/var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13-.alpha PATH=/var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13-.alpha:/var/tmp/portage/dev-util/umockdev-0.17.13/temp/python3.11/bin:/usr/lib/portage/python3.10/ebuild-helpers/xattr:/usr/lib/portage/python3.10/ebuild-helpers:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin TOP_SRCDIR=/var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13 /var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13/src/umockdev-wrapper /var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13-.alpha/test-umockdev-run

umockdev / static-code time out (After 150 seconds)
4/4 umockdev / static-code  TIMEOUT        150.05s   killed by signal 15 SIGTERM
>>> GI_TYPELIB_PATH=/var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13-.alpha MALLOC_CHECK_=3 G_DEBUG=fatal-warnings,fatal-criticals,gc-friendly MALLOC_PERTURB_=131 LD_LIBRARY_PATH=/var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13-.alpha PATH=/var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13-.alpha:/var/tmp/portage/dev-util/umockdev-0.17.13/temp/python3.11/bin:/usr/lib/portage/python3.10/ebuild-helpers/xattr:/usr/lib/portage/python3.10/ebuild-helpers:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin TOP_SRCDIR=/var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13 /var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13/src/umockdev-wrapper /var/tmp/portage/dev-util/umockdev-0.17.13/work/umockdev-0.17.13/tests/test-static-code

Summary of Failures:

1/4 umockdev / umockdev     TIMEOUT        150.10s   killed by signal 15 SIGTERM
2/4 umockdev / ioctl-tree   TIMEOUT        150.07s   killed by signal 15 SIGTERM
3/4 umockdev / umockdev-run TIMEOUT        150.06s   killed by signal 15 SIGTERM
4/4 umockdev / static-code  TIMEOUT        150.05s   killed by signal 15 SIGTERM

Ok:                 0
Expected Fail:      0
Fail:               0
Unexpected Pass:    0
Skipped:            0
Timeout:            4

Downstream Gentoo bug: https://bugs.gentoo.org/805083

Debian: https://buildd.debian.org/status/fetch.php?pkg=umockdev&arch=alpha&ver=0.17.13-1&stamp=1663045653&raw=1 (build history at https://buildd.debian.org/status/logs.php?pkg=umockdev&arch=alpha )

I have real hardware available that I can provide access to if it would be helpful for debugging. Thanks!

martinpitt commented 1 year ago

SSH access would be useful indeed, as there are no Debian porter boxes for alpha. my SSH key

As a first diagnosis, can you please run a single test in verbose mode, to see how far that gets? (in the build tree):

LD_LIBRARY_PATH=. LD_PRELOAD=libumockdev-preload.so ./test-umockdev

Does this simplest possible test case work?

LD_LIBRARY_PATH=. LD_PRELOAD=libumockdev-preload.so ./test-umockdev -p /umockdev-testbed/empty
matoro commented 1 year ago

SSH access would be useful indeed, as there are no Debian porter boxes for alpha. my SSH key

As a first diagnosis, can you please run a single test in verbose mode, to see how far that gets? (in the build tree):

LD_LIBRARY_PATH=. LD_PRELOAD=libumockdev-preload.so ./test-umockdev

Does this simplest possible test case work?

LD_LIBRARY_PATH=. LD_PRELOAD=libumockdev-preload.so ./test-umockdev -p /umockdev-testbed/empty

Nope, just tried - both commands just hang with no output. They don't respond to SIGINT either, have to kill them.

I've added you as a user - should be able to log in to martinpitt [at] alphadev [dot] matoro [dot] tk (IPv6 only). Let me know if you're able to take a look - thank you so much!!

martinpitt commented 1 year ago

@matoro Thanks! I can log in, but umockdev's build dependencies are missing. I started a tmux session if you want to attach/have a look. First error:

meson.build:1:0: ERROR: Could not execute Vala compiler: valac

Could you install these? Or is there some kind of build chroot on that box which I can access without root?

matoro commented 1 year ago

@matoro Thanks! I can log in, but umockdev's build dependencies are missing. I started a tmux session if you want to attach/have a look. First error:

meson.build:1:0: ERROR: Could not execute Vala compiler: valac

Could you install these? Or is there some kind of build chroot on that box which I can access without root?

Oops, sorry - gobject-introspection is installed now. Vala was actually already installed, but Gentoo does not symlink the plain valac command, so you have to set VALAC=valac-0.56, see our ebuild: https://github.com/gentoo/gentoo/blob/master/dev-util/umockdev/umockdev-9999.ebuild

martinpitt commented 1 year ago

Initial notes:

LD_LIBRARY_PATH=. LD_PRELOAD=libumockdev-preload.so ./test-umockdev -p /umockdev-testbed/empty

This locks up hard, not even Control-C or Control-\ help. Needs to be SIGKILLed. This makes gdb unusable as well, and strace isn't installed. @matoro, installing strace would be much appreciated!

UMOCKDEV_DEBUG=all LD_PRELOAD=libumockdev-preload.so ls

that works fine.

UMOCKDEV_DEBUG=all LD_LIBRARY_PATH=. ./umockdev-run ls

that segfaults immediately. This once again means that I can't run gdb inside of umockdev-run, and running it outside is rather pointless as the only backtrace that it gives me is for the Process.raise (Process.term_sig (child_status)) (i.e. passing on ls'es signal).

Same conundrum with a core dump -- dumping "outside" isn't useful, and dumping "inside" with

UMOCKDEV_DEBUG=all LD_LIBRARY_PATH=. ./umockdev-run -- bash -c 'ulimit -c unlimited; ls' 

crashes before bash loads properly. So it seems this needs some fine-grained printf debugging..

The closest that I can get to is the _access("/tmp/umockdev.T0BOU1/disabled", F_OK) call in the preload's path_exists(). No idea why this would hang. Replacing it with

struct statx st;
res = _statx(AT_FDCWD, path, 0, STATX_MODE, &st);

doesn't help either. Disabling the whole path_exists() check also doesn't help; this smells like a memory corruption that happened before already.

martinpitt commented 1 year ago

@matoro , can you please install strace and valgrind? This is a really hairy one, I'm afraid..

matoro commented 1 year ago

@matoro , can you please install strace and valgrind? This is a really hairly one, I'm afraid..

Normally I have strace installed, 5.19 is just blocked by https://github.com/strace/strace/issues/220 at the moment. But I just made a local repo and installed 5.18 from it for now.

Valgrind unfortunately requires platform-specific code, so there is of course no support for most of the more obscure platforms: https://valgrind.org/info/platforms.html

Let me know if there is anything else that would help the investigation. Thank you so much for looking at this!

martinpitt commented 1 year ago

Ah right, no valgrind.. Perhaps there's something simpler like electric-fence. I ran with strace, but this is inconclusive. With printf debugging and

pkill -9 -e chatter; rm -rf /tmp/umockdev.*; meson compile && G_MESSAGES_DEBUG=all UMOCKDEV_DEBUG=all LD_LIBRARY_PATH=. ./umockdev-run ./chatter /dev/null

it keeps hanging when calling the real glibc functions like access() and open(), and I don't know how to sensibly debug that. OTOH, when calling ls instead of chatter, it crashes much earlier. It's likely that there's some memory corruption somewhere, but I don't know how to figure this out. Valgrind on x86 and other architectures does not find anything.

If someone has another idea how to debug this, I'm all ears -- but there's only so much time I can throw at this, sorry.