libguestfs / libguestfs

library and tools for accessing and modifying virtual machine disk images. PLEASE DO NOT USE GITHUB FOR ISSUES OR PULL REQUESTS. See the website for how to file a bug or contact us.
http://libguestfs.org
GNU General Public License v2.0
605 stars 167 forks source link

1.52.0: c-api related tests fail (`libguestfs: trace: blockdev_setrw = -1 (error)`) #139

Closed dvzrv closed 2 months ago

dvzrv commented 2 months ago

Hi! :wave:

When rebuilding the libguestfs package for Arch Linux I am now seeing a lot of failing tests in the c-api test suite. I see hundreds of libguestfs: trace: blockdev_setrw = -1 (error). I am a bit clueless as to why this happens all of a sudden.

test-suite.log logs specific to tests/c-api: tests.log

dvzrv commented 2 months ago

The working version (after ignoring issues due to #136) of the package has been built using

{
  "builddate": 1709986878,
  "builddir": "/build",
  "buildenv": [
    "!distcc",
    "color",
    "!ccache",
    "check",
    "!sign"
  ],
  "buildtool": "devtools",
  "buildtoolver": "1:1.1.1-1-any",
  "installed": [
    "acl-2.3.2-1-x86_64",
    "aom-3.8.1-1-x86_64",
    "archlinux-keyring-20240208-1-any",
    "argon2-20190702-5-x86_64",
    "attr-2.5.2-1-x86_64",
    "audit-4.0-1-x86_64",
    "augeas-1.14.1-1-x86_64",
    "autoconf-2.72-1-any",
    "automake-1.16.5-2-any",
    "avahi-1:0.8+r194+g3f79789-1-x86_64",
    "base-devel-1-1-any",
    "bash-5.2.026-2-x86_64",
    "bash-completion-2.11-3-any",
    "binutils-2.42-2-x86_64",
    "bison-3.8.2-6-x86_64",
    "brotli-1.1.0-1-x86_64",
    "btrfs-progs-6.7.1-1-x86_64",
    "bzip2-1.0.8-5-x86_64",
    "ca-certificates-20220905-1-any",
    "ca-certificates-mozilla-3.98-1-x86_64",
    "ca-certificates-utils-20220905-1-any",
    "cairo-1.18.0-2-x86_64",
    "capstone-5.0.1-2-x86_64",
    "cdrtools-3.02a09-5-x86_64",
    "coreutils-9.4-3-x86_64",
    "cpio-2.15-1-x86_64",
    "cryptsetup-2.7.1-1-x86_64",
    "curl-8.6.0-3-x86_64",
    "dav1d-1.4.0-1-x86_64",
    "db5.3-5.3.28-4-x86_64",
    "dbus-1.14.10-2-x86_64",
    "dbus-broker-35-2-x86_64",
    "dbus-broker-units-35-2-x86_64",
    "debootstrap-1.0.134-1-any",
    "debugedit-5.0-5-x86_64",
    "device-mapper-2.03.23-3-x86_64",
    "dhcpcd-10.0.6-1-x86_64",
    "diffutils-3.10-1-x86_64",
    "docbook-xml-4.5-9-any",
    "docbook-xsl-1.79.2-7-any",
    "dosfstools-4.2-3-x86_64",
    "dtc-1.7.0-4-x86_64",
    "duktape-2.7.0-6-x86_64",
    "e2fsprogs-1.47.0-1-x86_64",
    "edk2-ovmf-202311-1-any",
    "erlang-nox-26.2.2-1-x86_64",
    "exfatprogs-1.2.2-1-x86_64",
    "expat-2.6.1-1-x86_64",
    "f2fs-tools-1.16.0-2-x86_64",
    "fakeroot-1.34-1-x86_64",
    "file-5.45-1-x86_64",
    "filesystem-2024.01.19-1-any",
    "findutils-4.9.0-3-x86_64",
    "flex-2.6.4-5-x86_64",
    "fontconfig-2:2.15.0-2-x86_64",
    "freetype2-2.13.2-1-x86_64",
    "fribidi-1.0.13-2-x86_64",
    "fuse-common-3.16.2-1-x86_64",
    "fuse2-2.9.9-4-x86_64",
    "fuse3-3.16.2-1-x86_64",
    "gawk-5.3.0-1-x86_64",
    "gc-8.2.6-1-x86_64",
    "gcc-13.2.1-5-x86_64",
    "gcc-libs-13.2.1-5-x86_64",
    "gd-2.3.3-7-x86_64",
    "gdbm-1.23-2-x86_64",
    "gdk-pixbuf2-2.42.10-2-x86_64",
    "gettext-0.22.4-1-x86_64",
    "ghostscript-10.03.0-1-x86_64",
    "giflib-5.2.2-1-x86_64",
    "glib2-2.78.4-1-x86_64",
    "glib2-docs-2.78.4-1-x86_64",
    "glibc-2.39-1-x86_64",
    "gmp-6.3.0-1-x86_64",
    "gnu-free-fonts-20120503-8-any",
    "gnupg-2.4.4-1-x86_64",
    "gnutls-3.8.3-1-x86_64",
    "go-2:1.22.1-1-x86_64",
    "gobject-introspection-1.78.1-1-x86_64",
    "gobject-introspection-runtime-1.78.1-1-x86_64",
    "gperf-3.1-5-x86_64",
    "gpgme-1.23.2-1-x86_64",
    "gpm-1.20.7.r38.ge82d1a6-5-x86_64",
    "gptfdisk-1.0.10-1-x86_64",
    "graphite-1:1.3.14-3-x86_64",
    "graphviz-10.0.1-1-x86_64",
    "grep-3.11-1-x86_64",
    "groff-1.23.0-5-x86_64",
    "grub-2:2.12-1-x86_64",
    "gsfonts-20200910-3-any",
    "gtk-doc-1.34.0-1-any",
    "gts-0.7.6.121130-2-x86_64",
    "guile-3.0.9-1-x86_64",
    "gzip-1.13-2-x86_64",
    "harfbuzz-8.3.0-2-x86_64",
    "hicolor-icon-theme-0.17-3-any",
    "hivex-1.3.23-3-x86_64",
    "hwdata-0.380-1-any",
    "iana-etc-20231228-1-any",
    "icu-74.2-1-x86_64",
    "ijs-0.35-6-x86_64",
    "iniparser-4.1-5-x86_64",
    "iproute2-6.7.0-1-x86_64",
    "iptables-1:1.8.10-1-x86_64",
    "iputils-20240117-1-x86_64",
    "jansson-2.14-2-x86_64",
    "java-environment-common-3-5-any",
    "java-runtime-common-3-5-any",
    "jbig2dec-0.20-1-x86_64",
    "jbigkit-2.1-7-x86_64",
    "jdk-openjdk-21.0.2.u13-3-x86_64",
    "jfsutils-1.1.15-8-x86_64",
    "json-c-0.17-1-x86_64",
    "json-glib-1.8.0-1-x86_64",
    "kbd-2.6.4-1-x86_64",
    "keyutils-1.6.3-2-x86_64",
    "kmod-32-1-x86_64",
    "krb5-1.21.2-2-x86_64",
    "lcms2-2.16-1-x86_64",
    "libaio-0.3.113-3-x86_64",
    "libarchive-3.7.2-1-x86_64",
    "libassuan-2.5.6-1-x86_64",
    "libavif-1.0.4-1-x86_64",
    "libbpf-1.3.0-1-x86_64",
    "libcap-2.69-3-x86_64",
    "libcap-ng-0.8.4-1-x86_64",
    "libconfig-1.7.3-2-x86_64",
    "libcups-1:2.4.7-2-x86_64",
    "libdaemon-0.14-5-x86_64",
    "libdatrie-0.2.13-4-x86_64",
    "libde265-1.0.15-1-x86_64",
    "libedit-20230828_3.1-1-x86_64",
    "libelf-0.191-1-x86_64",
    "libevent-2.1.12-4-x86_64",
    "libewf-20140608-6-x86_64",
    "libffi-3.4.6-1-x86_64",
    "libgcrypt-1.10.3-1-x86_64",
    "libgirepository-1.78.1-1-x86_64",
    "libgpg-error-1.48-1-x86_64",
    "libheif-1.17.6-3-x86_64",
    "libice-1.1.1-2-x86_64",
    "libidn-1.42-1-x86_64",
    "libidn2-2.3.7-1-x86_64",
    "libinih-57-1-x86_64",
    "libisl-0.26-1-x86_64",
    "libjpeg-turbo-3.0.2-2-x86_64",
    "libksba-1.6.6-1-x86_64",
    "libldap-2.6.6-2-x86_64",
    "libldm-0.2.5-2-x86_64",
    "libmnl-1.0.5-1-x86_64",
    "libmpc-1.3.1-1-x86_64",
    "libnbd-1.18.2-1-x86_64",
    "libnet-2:1.3-1-x86_64",
    "libnetfilter_conntrack-1.0.9-1-x86_64",
    "libnfnetlink-1.0.2-1-x86_64",
    "libnftnl-1.2.6-1-x86_64",
    "libnghttp2-1.60.0-1-x86_64",
    "libnghttp3-1.2.0-1-x86_64",
    "libnl-3.9.0-1-x86_64",
    "libnsl-2.0.1-1-x86_64",
    "libp11-kit-0.25.3-1-x86_64",
    "libpaper-2.1.3-1-x86_64",
    "libpcap-1.10.4-1-x86_64",
    "libpciaccess-0.18-1-x86_64",
    "libpng-1.6.43-1-x86_64",
    "libpsl-0.21.2-1-x86_64",
    "librsvg-2:2.57.1-1-x86_64",
    "libsasl-2.1.28-4-x86_64",
    "libseccomp-2.5.5-2-x86_64",
    "libsecret-0.21.4-1-x86_64",
    "libslirp-4.7.0-1-x86_64",
    "libsm-1.2.4-1-x86_64",
    "libssh-0.10.6-2-x86_64",
    "libssh2-1.11.0-1-x86_64",
    "libsysprof-capture-45.2-1-x86_64",
    "libtasn1-4.19.0-1-x86_64",
    "libthai-0.1.29-3-x86_64",
    "libtiff-4.6.0-2-x86_64",
    "libtirpc-1.3.4-1-x86_64",
    "libtool-2.4.7+4+g1ec8fa28-7-x86_64",
    "libtraceevent-1:1.8.2-1-x86_64",
    "libtracefs-1.8.0-1-x86_64",
    "libunistring-1.1-2-x86_64",
    "libunwind-1.7.2-1-x86_64",
    "liburcu-0.14.0-1-x86_64",
    "liburing-2.5-1-x86_64",
    "libusb-1.0.27-1-x86_64",
    "libutempter-1.2.1-4-x86_64",
    "libverto-0.3.2-4-x86_64",
    "libvirt-1:10.1.0-1-x86_64",
    "libwebp-1.3.2-1-x86_64",
    "libx11-1.8.7-1-x86_64",
    "libxau-1.0.11-2-x86_64",
    "libxcb-1.16.1-1-x86_64",
    "libxcrypt-4.4.36-1-x86_64",
    "libxdmcp-1.1.5-1-x86_64",
    "libxdp-1.4.2-1-x86_64",
    "libxext-1.3.6-1-x86_64",
    "libxft-2.3.8-1-x86_64",
    "libxml2-2.12.5-1-x86_64",
    "libxpm-3.5.17-1-x86_64",
    "libxrender-0.9.11-1-x86_64",
    "libxslt-1.1.39-1-x86_64",
    "libxt-1.3.0-1-x86_64",
    "libyaml-0.2.5-2-x86_64",
    "libyuv-r2426+464c51a0-1-x86_64",
    "libzip-1.10.1-1-x86_64",
    "linux-6.7.9.arch1-1-x86_64",
    "linux-api-headers-6.7-1-any",
    "llvm-libs-17.0.6-2-x86_64",
    "lrzip-0.651-2-x86_64",
    "lsof-4.99.3-1-x86_64",
    "lsscsi-0.32-1-x86_64",
    "lua-5.4.6-3-x86_64",
    "lvm2-2.03.23-3-x86_64",
    "lz4-1:1.9.4-2-x86_64",
    "lzo-2.10-5-x86_64",
    "lzop-1.04-3-x86_64",
    "m4-1.4.19-3-x86_64",
    "make-4.4.1-2-x86_64",
    "mdadm-4.3-2-x86_64",
    "mkinitcpio-38-4-any",
    "mkinitcpio-busybox-1.36.1-1-x86_64",
    "mpfr-4.2.1-2-x86_64",
    "mtools-1:4.0.43-1-x86_64",
    "multipath-tools-0.9.8-1-x86_64",
    "ncurses-6.4_20230520-1-x86_64",
    "ndctl-78-1-x86_64",
    "netpbm-10.86.40-1-x86_64",
    "nettle-3.9.1-1-x86_64",
    "nilfs-utils-2.2.9-2-x86_64",
    "npth-1.7-1-x86_64",
    "nspr-4.35-2-x86_64",
    "nss-3.98-1-x86_64",
    "ntfs-3g-2022.10.3-1-x86_64",
    "numactl-2.0.18-1-x86_64",
    "ocaml-5.1.0-1-x86_64",
    "ocaml-augeas-0.6-3-x86_64",
    "ocaml-compiler-libs-5.1.0-1-x86_64",
    "ocaml-findlib-1.9.6-3-x86_64",
    "oniguruma-6.9.9-1-x86_64",
    "openjpeg2-2.5.2-1-x86_64",
    "openssh-9.6p1-3-x86_64",
    "openssl-3.2.1-1-x86_64",
    "p11-kit-0.25.3-1-x86_64",
    "pacman-6.0.2-9-x86_64",
    "pacman-contrib-1.10.4-1-x86_64",
    "pacman-mirrorlist-20231001-1-any",
    "pam-1.6.0-4-x86_64",
    "pambase-20230918-1-any",
    "pango-1:1.52.1-1-x86_64",
    "parted-3.6-1-x86_64",
    "patch-2.7.6-10-x86_64",
    "pciutils-3.11.1-1-x86_64",
    "pcre2-10.42-2-x86_64",
    "perl-5.38.2-1-x86_64",
    "perl-inc-latest-0.500-11-any",
    "perl-libintl-perl-1.33-2-x86_64",
    "perl-module-build-0.4234-2-any",
    "php-8.3.3-1-x86_64",
    "pinentry-1.2.1-3-x86_64",
    "pixman-0.43.4-1-x86_64",
    "pkgconf-2.1.0-2-x86_64",
    "polkit-124-2-x86_64",
    "poppler-data-0.4.12-1-any",
    "popt-1.19-1-x86_64",
    "procps-ng-4.0.4-2-x86_64",
    "psmisc-23.7-1-x86_64",
    "python-3.11.8-1-x86_64",
    "python-lxml-4.9.2-3-x86_64",
    "python-mako-1.3.2-1-any",
    "python-markdown-3.5.2-1-any",
    "python-markupsafe-2.1.4-1-x86_64",
    "python-pygments-2.17.2-1-any",
    "qemu-base-8.2.2-1-x86_64",
    "qemu-common-8.2.2-1-x86_64",
    "qemu-img-8.2.2-1-x86_64",
    "qemu-system-x86-8.2.2-1-x86_64",
    "qemu-system-x86-firmware-8.2.2-1-x86_64",
    "rav1e-0.7.1-1-x86_64",
    "readline-8.2.010-1-x86_64",
    "reiserfsprogs-3.6.27-4-x86_64",
    "rsync-3.2.7-6-x86_64",
    "ruby-3.0.6-1-x86_64",
    "ruby-abbrev-0.1.0-4-any",
    "ruby-base64-0.1.1-4-any",
    "ruby-benchmark-0.2.0-4-any",
    "ruby-bigdecimal-3.1.2-4-x86_64",
    "ruby-bundledgems-3.0.6-1-x86_64",
    "ruby-bundler-2.5.4-1-any",
    "ruby-cgi-0.3.6-1-x86_64",
    "ruby-csv-3.2.5-4-any",
    "ruby-date-3.2.2-4-x86_64",
    "ruby-delegate-0.2.0-4-any",
    "ruby-did_you_mean-1.6.1-4-any",
    "ruby-digest-3.1.1-1-x86_64",
    "ruby-drb-2.1.0-5-any",
    "ruby-english-0.7.1-5-any",
    "ruby-erb-4.0.2-2-x86_64",
    "ruby-etc-1.3.0-6-x86_64",
    "ruby-fcntl-1.0.1-4-x86_64",
    "ruby-fiddle-1.1.0-4-x86_64",
    "ruby-fileutils-1.6.0-4-any",
    "ruby-find-0.1.1-4-any",
    "ruby-forwardable-1.3.2-6-any",
    "ruby-getoptlong-0.1.1-3-any",
    "ruby-io-console-0.5.11-3-x86_64",
    "ruby-io-nonblock-0.1.0-3-x86_64",
    "ruby-io-wait-0.2.3-4-x86_64",
    "ruby-ipaddr-1.2.4-3-any",
    "ruby-irb-1.4.2-1-any",
    "ruby-json-2.7.1-1-x86_64",
    "ruby-logger-1.5.1-3-any",
    "ruby-minitest-5.20.0-1-any",
    "ruby-mutex_m-0.1.1-3-any",
    "ruby-net-http-0.2.2-2-any",
    "ruby-open-uri-0.2.0-3-any",
    "ruby-power_assert-2.0.3-1-any",
    "ruby-psych-4.0.6-1-x86_64",
    "ruby-racc-1.6.0-3-x86_64",
    "ruby-rake-13.0.6-1-any",
    "ruby-rdoc-6.4.0-4-any",
    "ruby-reline-0.3.1-2-any",
    "ruby-rexml-3.2.6-1-any",
    "ruby-ruby2_keywords-0.0.5-1-any",
    "ruby-stdlib-3.0.6-1-x86_64",
    "ruby-stringio-3.0.2-4-x86_64",
    "ruby-test-unit-3.6.1-1-any",
    "ruby-time-0.2.0-4-any",
    "ruby-tmpdir-0.1.2-3-any",
    "ruby-uri-0.12.1-1-any",
    "rubygems-3.3.25-1-any",
    "rust-1:1.76.0-2-x86_64",
    "seabios-1.16.3-1-any",
    "sed-4.9-3-x86_64",
    "shadow-4.14.6-1-x86_64",
    "shared-mime-info-2.4-1-x86_64",
    "sleuthkit-4.12.1-1-x86_64",
    "snappy-1.1.10-1-x86_64",
    "sqlite-3.45.1-1-x86_64",
    "squashfs-tools-4.6.1-1-x86_64",
    "strace-6.7-1-x86_64",
    "sudo-1.9.15.p5-1-x86_64",
    "supermin-5.3.2-1-x86_64",
    "svt-av1-1.8.0-1-x86_64",
    "sysfsutils-2.1.1-1-x86_64",
    "syslinux-6.04.pre2.r11.gbf6db5b4-4-x86_64",
    "systemd-255.4-2-x86_64",
    "systemd-libs-255.4-2-x86_64",
    "systemd-sysvcompat-255.4-2-x86_64",
    "tar-1.35-2-x86_64",
    "texinfo-7.1-2-x86_64",
    "thin-provisioning-tools-1.0.12-1-x86_64",
    "tpm2-tss-4.0.1-1-x86_64",
    "tzdata-2024a-1-x86_64",
    "util-linux-2.39.3-2-x86_64",
    "util-linux-libs-2.39.3-2-x86_64",
    "vala-0.56.15-1-x86_64",
    "vde2-2.3.3-5-x86_64",
    "vim-9.1.0151-2-x86_64",
    "vim-runtime-9.1.0151-2-x86_64",
    "virtiofsd-1.10.1-1-x86_64",
    "wget-1.21.4-1-x86_64",
    "which-2.21-6-x86_64",
    "wolfssl-5.6.6-1-x86_64",
    "x265-3.5-3-x86_64",
    "xcb-proto-1.16.0-1-any",
    "xfsprogs-6.6.0-1-x86_64",
    "xorgproto-2023.2-1-any",
    "xxhash-0.8.2-1-x86_64",
    "xz-5.6.0-1-x86_64",
    "yajl-2.1.0-6-x86_64",
    "yara-4.3.2-1-x86_64",
    "zlib-1:1.3.1-1-x86_64",
    "zstd-1.5.5-1-x86_64"
  ],
  "options": [
    "strip",
    "docs",
    "!libtool",
    "!staticlibs",
    "emptydirs",
    "zipman",
    "purge",
    "debug",
    "lto"
  ],
  "packager": "Juergen Hoetzel <juergen@archlinux.org>",
  "pkgarch": "x86_64",
  "pkgbase": "libguestfs",
  "pkgbuild_sha256sum": "9037f32b28f41f45d0b4f43664954e83327251cc125174e84081a794bdfb1645",
  "pkgname": "libguestfs",
  "pkgver": "1.52.0-3",
  "schema_version": 2,
  "startdir": "/startdir"
}

My assumption is, that with util-linux 2.40.0 something in blockdev may have changed.

The first commit to blockdev.c past the 2.39.3 release is https://github.com/util-linux/util-linux/commit/4832fd9f36fbb7a12771b8e8df1e749ff14cc462 (see commit history: https://github.com/util-linux/util-linux/commits/master/disk-utils/blockdev.c).

dvzrv commented 2 months ago

Tried a rebuild with a util-linux that has https://github.com/util-linux/util-linux/commit/b3e9aadce221fef040e8a059650e351234c8a762 applied (on top of 2.40, but this leads to the same errors).

Maybe you have an idea?

rwmjones commented 2 months ago

The appliance crashed fairly early on in the tests:

 49/553 test_btrfs_rescue_super_recover_0
libguestfs: trace: feature_available "btrfs"
libguestfs: trace: feature_available = 1
libguestfs: trace: blockdev_setrw "/dev/sda"
libguestfs: error: appliance closed the connection unexpectedly.
This usually means the libguestfs appliance crashed.

As suggested later on, try doing:

  export LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1

and running just those tests again (make -C tests check TESTS=c-api/tests), which should provide more information.

dvzrv commented 2 months ago

To be on the safe side I have applied https://github.com/libguestfs/libguestfs/commit/7211aac047a10457650dad1da02383cfb8d24abb instead of disabling individual tests (SKIP_TEST_BTRFS_QGROUP_ASSIGN_0=1, SKIP_TEST_BTRFS_QGROUP_CREATE_0=1, SKIP_TEST_BTRFS_QGROUP_DESTROY_0=1, SKIP_TEST_BTRFS_QGROUP_REMOVE_0=1, SKIP_TEST_BTRFS_QGROUP_SHOW_0=1).

I ran with LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 make -C $pkgname-$pkgver/tests -k check TESTS=c-api/tests but the latter tests still fail unfortunately:

tests.log

dvzrv commented 2 months ago

One relevant bit is, that one of the btrfs related tests still fails:

6982 commandrvf: btrfs subvolume snapshot -i 0/1000 /sysroot/dir/test3 /sysroot/dir/test6
6983 ERROR: cannot snapshot '/sysroot/dir/test3': Invalid argument
6984 guestfsd: error: /dir/test3: /dir/test6: ERROR: cannot snapshot '/sysroot/dir/test3': Invalid argument
6985 guestfsd: => btrfs_subvolume_snapshot (0x142) took 0.08 secs
6986 libguestfs: trace: btrfs_subvolume_snapshot = -1 (error)
6987 libguestfs: error: btrfs_subvolume_snapshot: /dir/test3: /dir/test6: ERROR: cannot snapshot '/sysroot/dir/test3': Invalid argument
6988 FAIL: test_btrfs_subvolume_snapshot_0
6989 libguestfs: trace: get_verbose
6990 libguestfs: trace: get_verbose = 1

In test_drop_caches_0 we see a kernel panic in the QEMU vm, and after that all follow-up tests appear to die on libguestfs: trace: blockdev_setrw = -1 (error)

rwmjones commented 2 months ago
[  262.047812] BUG: kernel NULL pointer dereference, address: 0000000000000600
[  262.048173] #PF: supervisor read access in kernel mode
[  262.048293] #PF: error_code(0x0000) - not-present page
[  262.048476] PGD 0 P4D 0 
[  262.048947] Oops: 0000 [#2] PREEMPT SMP NOPTI
[  262.049212] CPU: 0 PID: 161 Comm: guestfsd Tainted: G      D W          6.8.5-arch1-1 #1 5f12b795066ab8d27a5fe9971245067df4fb99ed
[  262.049658] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[  262.049935] RIP: 0010:memcg_page_state+0x9/0x30
[  262.050713] Code: c3 cc cc cc cc eb f9 e9 05 b8 ff ff 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 <48> 8b 87 00 06 00 00 48 63 f6 31 d2 48 8b 04 f0 48 85 c0 48 0f 48
[  262.051117] RSP: 0018:ffffb02fc0897c48 EFLAGS: 00000246
[  262.051862] RAX: 00000000fffff33f RBX: ffffb02fc0897d18 RCX: 0000000000000003
[  262.052017] RDX: 0000000000000001 RSI: 0000000000000033 RDI: 0000000000000000
[  262.052164] RBP: 0000000000000000 R08: ffffa0a44339a000 R09: 0000000000000000
[  262.052313] R10: 0000000000000002 R11: dead000000000100 R12: ffffa0a48ffdb780
[  262.053175] R13: ffffa0a441c24c00 R14: 0000000000000000 R15: ffffa0a4431dfa80
[  262.053385] FS:  00007a4b316de540(0000) GS:ffffa0a48de00000(0000) knlGS:0000000000000000
[  262.053567] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  262.053694] CR2: 0000000000000600 CR3: 0000000004184000 CR4: 0000000000750ef0
[  262.054561] PKRU: 55555554
[  262.054670] Call Trace:
[  262.054829]  <TASK>
[  262.054962]  ? __die+0x23/0x70
[  262.055077]  ? page_fault_oops+0x171/0x4e0
[  262.055173]  ? free_unref_page+0xf9/0x180
[  262.055272]  ? exc_page_fault+0x7f/0x180
[  262.055580]  ? asm_exc_page_fault+0x26/0x30
[  262.055713]  ? memcg_page_state+0x9/0x30
[  262.055808]  zswap_shrinker_count+0xb4/0x120
[  262.055934]  do_shrink_slab+0x3a/0x360
[  262.056028]  shrink_slab+0xc7/0x3c0
[  262.056118]  drop_slab+0x85/0x140
[  262.056760]  drop_caches_sysctl_handler+0x7e/0xd0
[  262.056867]  proc_sys_call_handler+0x1c0/0x2e0
[  262.056973]  vfs_write+0x29e/0x470
[  262.057059]  ksys_write+0x6f/0xf0
[  262.057136]  do_syscall_64+0x83/0x170
[  262.057220]  ? exc_page_fault+0x7f/0x180
[  262.057309]  entry_SYSCALL_64_after_hwframe+0x78/0x80

Your kernel has a rather serious bug :-(

rwmjones commented 2 months ago

You might see if this solves it:

export LIBGUESTFS_MEMSIZE=4096
dvzrv commented 2 months ago

I'm afraid this did not work as intended either:

LIBGUESTFS_MEMSIZE=4096 LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 make -C $pkgname-$pkgver/tests -k check TESTS=c-api/tests

tests.log

rwmjones commented 2 months ago

That just increased the memory given to the appliance in case that was causing an issue. Since it's not, it does look like a genuine kernel bug, triggered by writing to /proc/sys/vm/drop_caches (or as a side-effect of that).

dvzrv commented 2 months ago

@christian-heusel was so kind to test on other kernel versions as host and we realized that the version used in the tests is hardcoded to be the default linux package.

Can this be adjusted somewhere? Maybe we could test against linux-lts instead until this issue is fixed.

rwmjones commented 2 months ago

Supermin will use the highest numbered kernel in /boot or /lib/modules and there's not really any way to change that. You can play around with environment variables to make it choose a different kernel, but that's not scalable for end users.

https://libguestfs.org/supermin.1.html#ENVIRONMENT-VARIABLES

Edit: After setting environment variables, make sure /var/tmp/.guestfs-$UID is deleted so it rebuilds the cache.

dvzrv commented 2 months ago

We are building in clean chroots and it appears as if just adding linux-lts instead of linux to the check dependencies is enough for the tests to pass!

I'll check if this rebuild also passes for Python 3.12 and then call it a day. Thanks for helping investigate this issue!

dvzrv commented 2 months ago

Argh... so this works with Python 3.11... however with Python 3.12 there seems to be another failing test. Will open another ticket for this.

I guess this one should remain open to figure out the kernel regression (or change in behavior?). According to @christian-heusel the kernel panic is also happening on mainline.

christian-heusel commented 2 months ago

I'm currently having a look at the kernel issue 👍🏻

rwmjones commented 2 months ago

Kernel bug reported in Fedora here: https://bugzilla.redhat.com/show_bug.cgi?id=2275252

christian-heusel commented 2 months ago

Yeah I'm currently bisecting it, it just takes time 😆 Should be done in about an hour or so 👍🏻

christian-heusel commented 2 months ago

See https://lore.kernel.org/all/3iccc6vjl5gminut3lvpl4va2lbnsgku5ei2d7ylftoofy3n2v@gcfdvtsq6dx2/

Toolybird commented 2 months ago

triggered by writing to /proc/sys/vm/drop_caches (or as a side-effect of that)

Yeah, makes it seem like there might be a way to trigger it outside of libguestfs (i.e. for the kernel folks). Or maybe even directly with the c-api or guestfish.

Everything works fine on 6.7.9. On 6.8.x I need the following to get the checks to pass:

SKIP_TEST_BTRFS_RESCUE_CHUNK_RECOVER_0=1 \
SKIP_TEST_DROP_CACHES_0=1 \
SKIP_TEST_IS_ZERO_DEVICE_0=1 \
SKIP_TEST_ZERO_FREE_SPACE_0=1 \
SKIP_TEST_BTRFS_SUBVOLUME_SNAPSHOT_0=1 \
make check

The first 4 cause the kernel oops. The last one didn't.. which makes it seem like a separate new regression in 6.8.x

christian-heusel commented 2 months ago

The first 4 cause the kernel oops. The last one didn't.. which makes it seem like a new regression in 6.8.x

Yes, the commits I have bisected to are all in v6.8+:

$ git tag --contains b5ba474f3f51 | grep -v next
v6.8
v6.8-rc1
v6.8-rc2
v6.8-rc3
v6.8-rc4
v6.8-rc5
v6.8-rc6
v6.8-rc7
v6.9-rc1
v6.9-rc2
v6.9-rc3
v6.9-rc4

Lets see which of the three really is the culprit 🤔 Sadly I could not narrow it down to a single commit due to compilation errors 😢 But yeah the linked mail from above has all the details 👍🏻

Toolybird commented 2 months ago

And of course, at the risk of stating the bleeding obvious, the 6.8.x kernel oops can be circumvented by disabling zswap functionality altogether (as a temporary measure):

LIBGUESTFS_APPEND=zswap.enabled=0 \
  make check

or for more finer grained control:

LIBGUESTFS_APPEND=zswap.shrinker_enabled=0 \
  make check
christian-heusel commented 2 months ago

With the related fix now being on the way to the kernel I think we can close this 😊 👍🏻

rwmjones commented 2 months ago

Let's close this one, thanks everyone for investigating and fixing.