Closed Uglymotha closed 3 weeks ago
mkdir /tmp/a cd /tmp/a xz -dc /boot/ugly-linux-main/initrd-6.11-ugly-linux-main |cpio -di
find . |cpio -H newc -o |xz -T0 --check=crc32 >/boot/ugly-linux-main/initrd-6.11-ugly-linux-main systemctl reboot
texinfo libltdl-dev tk pp (libperl.so -> aarch64-linux-gnu/libperl.so.5...) gawk lzip build-essential bison flex
Found the culprit, in dev_event_nvlist(struct udev_device dev): /
is /dev/sda. / struct udev_device parent_dev = udev_device_get_parent(dev); if ((value = udev_device_get_sysattr_value(parent_dev, "size")) != NULL) { uint64_t numval = DEV_BSIZE;
numval *= strtoull(value, NULL, 10);
(void) nvlist_add_uint64(nvl, DEV_PARENT_SIZE, numval);
}
}
In certain cases, like DM-CRYPT-PLAIN devices there is no parent. if (parent_dev != NULL && (value = udev_device_get_sysattr_value(parent_dev, "size")) Fixes the issue. I will submit a PR for this.
However from my troubleshooting a new question arises. DM_CRYPT_PLAIN devices seem to behave much like multipath devices. First an add is received for the device, followed by a change with the correct information, see log below. Should this EC_DEV_STATUS be handled as a EC_DEV_ADD just like multipath devices? Nov 3 18:04:02 santest zed[2553]: zed_udev_monitor: 0x7fd050002340, add, /dev/dm-4, disk Nov 3 18:04:02 santest zed[2553]: zed_udev_monitor: /dev/dm-4 no devid source
Nov 3 18:04:02 santest zed[2553]: zed_udev_monitor: 0x7fd0500056d0, change, /dev/dm-4, disk Nov 3 18:04:02 santest zed[2553]: #011class: EC_dev_status Nov 3 18:04:02 santest zed[2553]: #011subclass: dev_dle Nov 3 18:04:02 santest zed[2553]: #011dev_name: /dev/dm-4 Nov 3 18:04:02 santest zed[2553]: #011path: /devices/virtual/block/dm-4 Nov 3 18:04:02 santest zed[2553]: #011devid: dm-uuid-CRYPT-PLAIN-storage1 Nov 3 18:04:02 santest zed[2553]: #011phys_path: /dev/disk/by-uuid/3533779146875541629 Nov 3 18:04:02 santest zed[2553]: #011dev_size: 17179869184 Nov 3 18:04:02 santest zed[2553]: #011pool_guid: 3533779146875541629 Nov 3 18:04:02 santest zed[2553]: #011vdev_guid: 11766088279060322789
System information
Distribution Name | custom linux Distribution Version | n/a Kernel Version | 6.11.5 Architecture | x86_64 OpenZFS Version | 2.2.6
zed segfaults after assertion failure in udev: Oct 29 16:57:07 rdsan01 zed[18154]: Assertion 'udev_device' failed at src/libudev/libudev-device.c:742, function udev_device_get_sysattr_value(). Aborting. Oct 29 16:57:07 rdsan01 systemd[1]: zfs-zed.service: Main process exited, code=dumped, status=6/ABRT Oct 29 16:57:07 rdsan01 systemd[1]: zfs-zed.service: Failed with result 'core-dump'. Oct 29 16:57:07 rdsan01 systemd[1]: zfs-zed.service: Scheduled restart job, restart counter is at 7. Oct 29 16:57:07 rdsan01 systemd[1]: zfs-zed.service: Start request repeated too quickly. Oct 29 16:57:07 rdsan01 systemd[1]: zfs-zed.service: Failed with result 'core-dump'.
Describe how to reproduce the problem
This happens during udev triggering (udevadm trigger -s block).
Include any warning/errors/backtraces from the system logs
Process 30394 (zed) of user 0 dumped core.
Module libcap.so.2 without build-id. Module libresolv.so.2 without build-id. Module libkeyutils.so.1 without build-id. Module libkrb5support.so.0 without build-id. Module libgmp.so.10 without build-id. Module ld-linux-x86-64.so.2 without build-id. Module libuuid.so.1 without build-id. Module libudev.so.1 without build-id. Module libz.so.1 without build-id. Module libgcc_s.so.1 without build-id. Module libc.so.6 without build-id. Module libunwind.so.8 without build-id. Module libcom_err.so.2 without build-id. Module libk5crypto.so.3 without build-id. Module libkrb5.so.3 without build-id. Module libgssapi_krb5.so.2 without build-id. Module libtirpc.so.3 without build-id. Module libnvpair.so.3 without build-id. Module libcrypto.so.3 without build-id. Module libm.so.6 without build-id. Module libuutil.so.3 without build-id. Module libblkid.so.1 without build-id. Module libzfs_core.so.3 without build-id. Module libzfs.so.4 without build-id. Module zed without build-id. Stack trace of thread 31364:
0 0x00007f17c40e9e7c __pthread_kill_implementation (libc.so.6 + 0x8de7c)
1 0x00007f17c409b3b2 raise (libc.so.6 + 0x3f3b2)
2 0x00007f17c40844ad abort (libc.so.6 + 0x284ad)
3 0x00007f17c3fca995 log_assert_failed.cold (libudev.so.1 + 0x8995)
4 0x00007f17c3ff0077 log_assert_failed_return (libudev.so.1 + 0x2e077)
5 0x00007f17c3fcbc9f udev_device_get_sysattr_value (libudev.so.1 + 0x9c9f)
6 0x0000561ddc78648e zed_udev_monitor (zed + 0xc48e)
7 0x00007f17c40e81b2 start_thread (libc.so.6 + 0x8c1b2)
8 0x00007f17c4162288 __clone3 (libc.so.6 + 0x106288)
Stack trace of thread 30394:
0 0x00007f17c415dfdb ioctl (libc.so.6 + 0x101fdb)
1 0x00007f17c4b2ca2c zpool_events_next (libzfs.so.4 + 0x45a2c)
2 0x0000561ddc786e7b zed_event_service (zed + 0xce7b)
3 0x0000561ddc784bd8 main (zed + 0xabd8)
4 0x00007f17c4085d7a __libc_start_call_main (libc.so.6 + 0x29d7a)
5 0x00007f17c4085e35 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x29e35)
6 0x0000561ddc784561 _start (zed + 0xa561)
Stack trace of thread 31363:
0 0x00007f17c415dfdb ioctl (libc.so.6 + 0x101fdb)
1 0x00007f17c4b133dd zpool_refresh_stats (libzfs.so.4 + 0x2c3dd)
2 0x00007f17c4b26b65 zpool_open_silent (libzfs.so.4 + 0x3fb65)
3 0x00007f17c4b136d0 zpool_iter (libzfs.so.4 + 0x2c6d0)
4 0x0000561ddc78d1a1 zfs_slm_event (zed + 0x131a1)
5 0x0000561ddc78b09b zfs_agent_consumer_thread (zed + 0x1109b)
6 0x00007f17c40e81b2 start_thread (libc.so.6 + 0x8c1b2)
7 0x00007f17c4162288 __clone3 (libc.so.6 + 0x106288)
ELF object binary architecture: AMD x86-64 core.zed.0.e9cc196a28654a98a7139ee0d030939f.30394.1730291497000000.zip