rauc / meta-rauc

Yocto/Open Embedded meta layer for RAUC, the embedded Linux update framework
MIT License
162 stars 90 forks source link

Setting RAUC_SLOT_rootfs[adaptive] ?= "block-hash-index": results in error: (rauc:818): GLib-CRITICAL **: 13:33:51.290: g_close(fd:3) failed with EBADF. The tracking of file descriptors got messed up #334

Closed CramBL closed 2 weeks ago

CramBL commented 1 month ago

If I set RAUC_SLOT_rootfs[adaptive] ?= "block-hash-index" I get this error.

logfile of failure:

DEBUG: Executing shell function do_bundle
   2   │ rauc-Message: 13:33:46.254: Debug log domains: 'rauc'
   3   │ (rauc:818): rauc-DEBUG: 13:33:46.255: bundle start
   4   │ (rauc:818): rauc-DEBUG: 13:33:46.255: input directory: /app/yocto/build/tmp/work/bifrost_machine-poky-linux-gnueabi/bifrost-update-bundle/1.0/bundle
   5   │ (rauc:818): rauc-DEBUG: 13:33:46.255: output bundle: /app/yocto/build/tmp/work/bifrost_machine-poky-linux-gnueabi/bifrost-update-bundle/1.0/build/bundle.raucb
   6   │ Creating 'verity' format bundle
   7   │
   8   │ (rauc:818): GLib-CRITICAL **: 13:33:51.290: g_close(fd:3) failed with EBADF. The tracking of file descriptors got messed up
   9   │ WARNING: exit code 133 from a shell command.
  10   │ Trace/breakpoint trap (core dumped)

Without it there's no problem, and I can transfer the bundles and rauc install etc.

If I set RAUC_SLOT_rootfs[adaptive] = "" it is also OK.

jluebbe commented 1 month ago

I've never seen this so far... Which versions (RAUC, meta-rauc, glib on the host, distribution on the host) do you use?

CramBL commented 1 month ago

Thanks for the quick response, and thanks for Rauc and all the good work you do :+1:

I hope the information below is enough, and not too noisy :smile: let me know if there's anything else you want to know.

We build through docker/docker-compose on Ubuntu 22.04, my machine and our build server both fails with the same error,

Glibc

Adding

do_bundle:prepend() {
    bbplain "$(ldd --version)"
    bbplain "$(rauc --version)"
}

Gives

DEBUG: Executing shell function do_bundle
| ldd (Ubuntu GLIBC 2.35-0ubuntu3.7) 2.35
| Copyright (C) 2022 Free Software Foundation, Inc.
| This is free software; see the source for copying conditions.  There is NO
| warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
| Written by Roland McGrath and Ulrich Drepper.
| rauc 1.11.1

meta-rauc

Specified like this in our image.yml (we are using kas)

meta-rauc:
    url: "https://github.com/rauc/meta-rauc.git"
    commit: "760e9926239739385348501326e522973e5091af"
    path: "layers/meta-rauc"

Project specific rauc bundle bb-file

In the specific project we require a recipe from the layer where we setup most of the Rauc configuration (see further down) , and our rauc bundle bb-file looks like this:

require recipes-bundles/zynq-update-bundle/zynq-update-bundle.bb

DESCRIPTION = "Bifrost rootfs update bundle image"

RAUC_BUNDLE_DESCRIPTION = "RAUC Bifrost update bundle image"
RAUC_SLOT_rootfs = "bifrost-image"

RAUC_SLOT_rootfs[adaptive] = "block-hash-index"

Layer where we configure rauc and the kernel for a Zynq target

Rauc bundle bb-file

inherit bundle

DESCRIPTION = "Base rootfs update bundle image"

RAUC_BUNDLE_COMPATIBLE = "MyZynq"
RAUC_BUNDLE_VERSION = "v20240102"
RAUC_BUNDLE_DESCRIPTION ?= "RAUC Zynq update bundle image"
RAUC_BUNDLE_SLOTS = "rootfs"
RAUC_BUNDLE_FORMAT = "verity"
RAUC_SLOT_rootfs ?= "zynq-image"
RAUC_SLOT_rootfs[fstype] = "ext4"
RAUC_SLOT_rootfs[adaptive] = "block-hash-index"

RAUC_KEY_FILE = "${THISDIR}/files/development-1.key.pem"
RAUC_CERT_FILE = "${THISDIR}/files/development-1.cert.pem"

system.conf

[system]
compatible=MyZynq
bootloader=uboot
data-directory=/data/rauc
bundle-formats=-plain

[keyring]
path=/etc/rauc/ca.cert.pem

[slot.rootfs.0]
device=/dev/mmcblk0p2
type=ext4
bootname=A
resize=true

[slot.rootfs.1]
device=/dev/mmcblk0p3
type=ext4
bootname=B
resize=true

rauc.cfg

# https://rauc.readthedocs.io/en/latest/integration.html#kernel-configuration
CONFIG_SQUASHFS=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_SQUASHFS_FILE_CACHE=y
CONFIG_SQUASHFS_DECOMP_SINGLE=y
CONFIG_SQUASHFS_ZLIB=y
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3

# Verity support
CONFIG_MD=y
CONFIG_BLK_DEV_DM=y
CONFIG_DM_VERITY=y
CONFIG_CRYPTO_SHA256=y
# Streaming support
CONFIG_BLK_DEV_NBD=y
CramBL commented 1 month ago

After deleting and rebuilding a shared sstate-cache the error is now gone.

jluebbe commented 1 month ago

Thanks for the update. I'm going to close this, as I don't see a way to reproduce it with the current info. Please reopen if it reoccurs.

CramBL commented 1 month ago

This issue reappeared a few days later. And I have failed to get any build to work with adaptive streaming turned on.

DEBUG: Executing shell function do_bundle
rauc-Message: 00:20:03.029: Debug log domains: 'rauc'
(rauc:193430): rauc-DEBUG: 00:20:03.030: bundle start
(rauc:193430): rauc-DEBUG: 00:20:03.030: input directory: /app/yocto/build/tmp/work/test_machine_xilinx-poky-linux-gnueabi/test-template-ci-xilinx-update-bundle/1.0/bundle
(rauc:193430): rauc-DEBUG: 00:20:03.030: output bundle: /app/yocto/build/tmp/work/test_machine_xilinx-poky-linux-gnueabi/test-template-ci-xilinx-update-bundle/1.0/build/bundle.raucb
Creating 'verity' format bundle

(rauc:193430): GLib-CRITICAL **: 00:20:05.616: g_close(fd:3) failed with EBADF. The tracking of file descriptors got messed up
WARNING: exit code 133 from a shell command.
Trace/breakpoint trap (core dumped)

I have tried:

With no success. So now I will have to delve deeper into the issue, and I hope to get some aid from knowledgeable rauc devs.

For now, I am occupied with other things and I will have to disable adaptive streaming, but I will soon return to this issue and work on understanding it and finding a solution.

jluebbe commented 1 month ago

Could you retry with the newly released v1.12? Also, which version of GlLib (!= glibc) are you using?

If you can get a backtrace from the core dump, that would also be helpful.

RobertBerger commented 2 weeks ago

I see the same problem on a different platform.

strace /workdir/build/3rd-party/phytec-regor-fresh/build/tmp/work/phyboard_regor_am335x_1-phytec-linux-gnueabi/phytec-headless-bundle/1.0/recipe-sysroot-native/usr/bin/rauc bundle --debug --cert="/workdir/build/3rd-party/phytec-regor-fresh/sources/poky/../../phytec-dev-ca/rauc-intermediate/development-1.cert.pem" --key="/workdir/build/3rd-party/phytec-regor-fresh/sources/poky/../../phytec-dev-ca/rauc-intermediate/private/development-1.key.pem" --intermediate=/workdir/build/3rd-party/phytec-regor-fresh/sources/poky/../../phytec-dev-ca/rauc-intermediate/ca.cert.pem /workdir/build/3rd-party/phytec-regor-fresh/build/tmp/work/phyboard_regor_am335x_1-phytec-linux-gnueabi/phytec-headless-bundle/1.0/sources/bundle /workdir/build/3rd-party/phytec-regor-fresh/build/tmp/work/phyboard_regor_am335x_1-phytec-linux-gnueabi/phytec-headless-bundle/1.0/build/bundle.raucb

...
close(3)                                = 0
openat(AT_FDCWD, "/workdir/build/3rd-party/phytec-regor-fresh/build/tmp/work/phyboard_regor_am335x_1-phytec-linux-gnueabi/phytec-headless-bundle/1.0/sources/bundle/.rauc-workdir/phytec-headless-image-phyboard-regor-am335x-1.rootfs.ubifs", O_RDONLY|O_CLOEXEC) = 3
lseek(3, 0, SEEK_END)                   = 98445312
lseek(3, 0, SEEK_SET)                   = 0
close(3)                                = 0
close(3)                                = -1 EBADF (Bad file descriptor)
ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0
getpid()                                = 118760
write(2, "\n(rauc:118760): GLib-\33[1;35mCRIT"..., 128
(rauc:118760): GLib-CRITICAL **: 15:46:10.054: g_close(fd:3) failed with EBADF. The tracking of file descri) = 128
write(2, "ptors got messed up\n", 20ptors got messed up
)   = 20
--- SIGTRAP {si_signo=SIGTRAP, si_code=SI_KERNEL} ---
+++ killed by SIGTRAP (core dumped) +++

If I set RAUC_SLOT_rootfs[adaptive] = "" it builds.

ejoerns commented 2 weeks ago

@RobertBerger Thank you for providing the trace. This allowed me to identify the potentially responsible code.

I can confirm that it's a bug in the hash_index fd handling that results in a double-close of the same fd.

It does not hit us normally since it is the result of a previous error and requires at least glib 2.75.0 where the g_close() handling became much pickier (to reveal issues like we have): https://gitlab.gnome.org/GNOME/glib/-/merge_requests/2964

jluebbe commented 2 weeks ago

lseek(3, 0, SEEK_END) = 98445312

The size is not a multiple of 4KiB (only of 2KiB), which is detected by RAUC. During error handling, the FD is closed twice, which leads to an abort before the actual error message can be printed. We'll fix the double-close issue, but @RobertBerger and @CramBL should be able to avoid this issue by making the image a multiple of 4KiB (truncate -s %4096 <file>).

jluebbe commented 2 weeks ago

openat(AT_FDCWD, "….rootfs.ubifs", O_RDONLY|O_CLOEXEC) = 3

@RobertBerger: Note that while you can generate a hash-index for adaptive updates for ubifs, it won't be used during installation (as UBI aren't block devices and would need special handling). In principle, it could be implemented, though. If you're interested in implementing that, I'd write down what would be needed.