kdave / btrfs-progs

Development of userspace BTRFS tools
GNU General Public License v2.0
527 stars 239 forks source link

UUID not consistently available in output of lsblk after formatting #734

Closed codefiles closed 4 months ago

codefiles commented 5 months ago

Out of btrfs, ext4, f2fs, and xfs, only btrfs will not consistently show the UUID in the output of lsblk given the following Bash script. Any insight would be greatly appreciated.

set -e

device=$1
fstype=$2

wipefs -a "$device"*

parted -s "$device" \
    mklabel gpt \
    mkpart '"EFI system partition"' fat32 1MiB 513MiB \
    set 1 esp on \
    mkpart '"root partition"' "$fstype" 513MiB 100%

udevadm settle
partitions=($(lsblk -n -o path "$device"))

mkfs.fat -F 32 ${partitions[1]}
mkfs."$fstype" ${partitions[2]}

udevadm settle
lsblk -n -o name,uuid "$device"
adam900710 commented 5 months ago

Mind to explain the "inconsistent" part?

I ran the script several time, it always goes expected:

Run one:

loop0
|-loop0p1 BBEA-3E8A
`-loop0p2 e193f6cd-c39d-447b-b5c0-ef57c2082161

Run2:

loop0
|-loop0p1 BDF3-552B
`-loop0p2 76444aa2-d656-4b01-b8ca-b44d3504e76c
adam900710 commented 5 months ago

And the UUID from lsblk matches the one from mkfs.btrfs or super dump.

codefiles commented 5 months ago

It appears the issue only occurs with physical hardware and not a virtual disk or loop device.

It is inconsistent in that the UUID is available in the output of lsblk sometimes but not always. In either case the output from mkfs.btrfs will contain the UUID.

Run with UUID shown in output of lsblk command:

sda    
├─sda1 074D-D146
└─sda2 66454024-4472-4cd7-a9df-967edfe47ad8

Run with no UUID shown in output of lsblk command:

sda    
├─sda1 FE35-F1E4
└─sda2 

Using udevadm monitor -p it can be observed that when the UUID is not present in the lsblk output that the udev change event for the format does not have the ID_FS_UUID property and the DEVLINKS property does not contain a /dev/disk/by-uuid/ path.

Relevant: https://github.com/archlinux/archinstall/issues/2305

adam900710 commented 5 months ago

It may be related to a bug of mkfs.btrfs, which goes several open(), write(), close() combinations. Since libuuid get triggered for each writable open() close() pair, it may lead to the problem.

But since it's Archlinux, and the bug is already fixed by commit b2a1be83b85f ("btrfs-progs: mkfs: keep file descriptors open during whole time"), thus I don't think it's the same.

BTW, I just reproduced it locally (even using the same loopback device), it's indeed with some randomness.

I'll take a deeper look into the problem.

codefiles commented 5 months ago

This change to the script seems to resolve the issue.

-mkfs."$fstype" ${partitions[2]}
+udevadm lock --device=${partitions[2]} mkfs."$fstype" ${partitions[2]}

Example from the udevadm man page:

Example 1. Format a File System

Take a lock on the backing block device while creating a file system, to ensure that systemd-udevd doesn't probe or announce the new superblock before it is comprehensively written:

# udevadm lock --device=/dev/sda1 mkfs.ext4 /dev/sda1

Maybe this could be fixed in mkfs.btrfs with an implementation of Locking Block Device Access now that https://github.com/kdave/btrfs-progs/commit/b2a1be83b85f183f582f07fac0279d40fc62a4db keeps the file descriptors open and only closes them at the end?

kdave commented 5 months ago

Calling flock on the file descriptor sounds like the right thing to do.

adam900710 commented 5 months ago

I have considered flock() before, but it doesn't work on partition block device, and always need to be called on the main block device.

Let me check how systemd handles the partition -> main block device search.

EDIT: The page already has the example to grab the main block device name.

I'll go implement it soon. Thanks for the doc!

adam900710 commented 5 months ago

@codefiles Would you mind to test my branch flock?

I introduced the proper flock handling for all involved devices (and extra duplication check to prevent double lock). At least in my local environment I can no longer reproduce the error.

I'm running the remaining selftests, but so far nothing seems broken.

codefiles commented 5 months ago

I can still reproduce the bug using that branch.

adam900710 commented 5 months ago

OK, it looks like the way to grab dev_t is incorrect. Let me double check the proper way to grab the device number.

adam900710 commented 5 months ago

Branch updated, two fixes:

Unfortunately since it's not reliably here to reproduce the problem, @codefiles mind to retest the same flock branch? At least this time I'm sure the flock part is properly working.

codefiles commented 5 months ago

That works, I can no longer reproduce the bug. Nice job. Thank you.

kdave commented 5 months ago

Thanks, the fix (v2) is now in devel.

gtmoth commented 5 months ago

I have been testing with patch 00c5eaa] and I found it not working in some cases. I am pasting snippets from journalctrl -f output and the strace below. I belive this issue exists in the later version also. At the time of last inotify, the flock() on /dev/sde was still held.

Feb 01 12:48:09 gmananth-20240125-1442 systemd-udevd[1109]: inotify event: 8 for /dev/sde2 Feb 01 12:48:09 gmananth-20240125-1442 systemd-udevd[1109]: device /dev/sde2 closed, synthesising 'change' Feb 01 12:48:09 gmananth-20240125-1442 systemd-udevd[1109]: seq 28364 queued, 'change' 'block' Feb 01 12:48:09 gmananth-20240125-1442 systemd-udevd[1109]: seq 28364 forked new worker [469927] Feb 01 12:48:09 gmananth-20240125-1442 systemd-udevd[469927]: seq 28364 running Feb 01 12:48:09 gmananth-20240125-1442 systemd-udevd[469927]: Unable to flock(/dev/sde), skipping event handling: Resource temporarily unavailable Feb 01 12:48:09 gmananth-20240125-1442 systemd-udevd[469927]: seq 28364 processed Feb 01 12:48:09 gmananth-20240125-1442 systemd-udevd[1109]: cleanup idle workers Feb 01 12:48:09 gmananth-20240125-1442 systemd-udevd[469927]: Unload module index Feb 01 12:48:09 gmananth-20240125-1442 systemd-udevd[469927]: Unloaded link configuration context. Feb 01 12:48:09 gmananth-20240125-1442 systemd-udevd[1109]: worker [469927] exited Feb 01 12:48:09 gmananth-20240125-1442 kernel: BTRFS: device fsid 5e381a72-f0be-476a-9c64-9b4d633c6447 devid 1 transid 6 /dev/sde2 scanned by mkfs.btrfs (469923)

From the strace it confirms this rite(1, " ID SIZE PATH\n", 24) = 24 write(1, " 1 39.00GiB /dev/sde2\n", 29) = 29 write(1, "\n", 1) = 1 close(6) = 0 openat(AT_FDCWD, "/dev/btrfs-control", O_RDWR) = 4 close(4) = 0 flock(3, LOCK_UN) = 0 +++ exited with 0 +++

So the inotify is being skipped.

adam900710 commented 5 months ago

Firstly, mind to explain more on "not working in some cases"? Like is udev still getting incorrect detection or something else?

Secondly, the output shows the udev is properly skipping the event handling correctly:

Unable to flock(/dev/sde), skipping event handling: Resource temporarily unavailable

As we're holding the flock of the device during mkfs.

Or did you mean that, we should only close the writable fs, after we have unlocked the parent block devices so that udev can get a proper inofity and process with the new fs?

adam900710 commented 5 months ago

@codefiles @gtmoth Would you please test another branch? https://github.com/adam900710/btrfs-progs/tree/stray_fd

This branch goes another solution, by properly pinning down a stray fd close. I believe this is the root cause, and even without the flock() calls, it should solve the problem. (And it's way smaller, and won't cause any extra wait).

codefiles commented 5 months ago

Sure. Checked and I can't reproduce the bug.

gtmoth commented 5 months ago

I was able to reproduce the bug.

adam900710 commented 5 months ago

Any strace and journal for the reproduced bug?

As the new branch doesn't go flock() anymore, it just expands the lifespan to make sure all writeable fds only get closed when the fs is fully committed.

In that case, the inotify should properly went to udev and the scan can get the correct UUID.

gtmoth commented 5 months ago

I am backporting your patches onto 5.15.1-0 and testing. Not sure if the srace and journalctrl is useful.

cat strace-sd1-stray-fd-latest.log openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 close(3) = 0 openat(AT_FDCWD, "/lib64/libuuid.so.1", O_RDONLY|O_CLOEXEC) = 3 close(3) = 0 openat(AT_FDCWD, "/lib64/libblkid.so.1", O_RDONLY|O_CLOEXEC) = 3 close(3) = 0 openat(AT_FDCWD, "/lib64/libudev.so.1", O_RDONLY|O_CLOEXEC) = 3 close(3) = 0 openat(AT_FDCWD, "/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3 close(3) = 0 openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 close(3) = 0 openat(AT_FDCWD, "/lib64/libmount.so.1", O_RDONLY|O_CLOEXEC) = 3 close(3) = 0 openat(AT_FDCWD, "/lib64/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3 close(3) = 0 openat(AT_FDCWD, "/lib64/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3 close(3) = 0 openat(AT_FDCWD, "/lib64/librt.so.1", O_RDONLY|O_CLOEXEC) = 3 close(3) = 0 openat(AT_FDCWD, "/lib64/libpcre2-8.so.0", O_RDONLY|O_CLOEXEC) = 3 close(3) = 0 openat(AT_FDCWD, "/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3 close(3) = 0 openat(AT_FDCWD, "/proc/swaps", O_RDONLY) = 3 close(3) = 0 openat(AT_FDCWD, "/dev/sdb1", O_RDONLY) = 3 openat(AT_FDCWD, "/proc/self/mounts", O_RDONLY|O_CLOEXEC) = 4 close(4) = 0 close(3) = 0 openat(AT_FDCWD, "/dev/sdb1", O_RDWR|O_EXCL) = 3 close(3) = 0 openat(AT_FDCWD, "/dev/sdb1", O_RDONLY|O_CLOEXEC) = 3 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 4 openat(4, "dm/uuid", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) close(4) = 0 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 4 openat(AT_FDCWD, "/sys/block/sdb/dev", O_RDONLY|O_CLOEXEC) = 5 close(5) = 0 close(4) = 0 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 4 close(4) = 0 close(3) = 0 openat(AT_FDCWD, "/sys/block/sdb/queue/rotational", O_RDONLY) = 3 close(3) = 0 openat(AT_FDCWD, "/dev/sdb1", O_RDONLY|O_CLOEXEC) = 3 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 4 openat(4, "dm/uuid", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) close(4) = 0 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 4 openat(AT_FDCWD, "/sys/block/sdb/dev", O_RDONLY|O_CLOEXEC) = 5 close(5) = 0 close(4) = 0 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 4 close(4) = 0 close(3) = 0 openat(AT_FDCWD, "/sys/block/sdb/queue/zoned", O_RDONLY) = 3 close(3) = 0 openat(AT_FDCWD, "/dev/sdb1", O_RDONLY) = 3 close(3) = 0 openat(AT_FDCWD, "/dev/sdb1", O_RDWR) = 3 openat(AT_FDCWD, "/dev/sdb1", O_RDONLY|O_CLOEXEC) = 4 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 5 openat(5, "dm/uuid", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) close(5) = 0 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 5 openat(AT_FDCWD, "/sys/block/sdb/dev", O_RDONLY|O_CLOEXEC) = 6 close(6) = 0 close(5) = 0 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 5 close(5) = 0 close(4) = 0 openat(AT_FDCWD, "/sys/block/sdb/queue/discard_granularity", O_RDONLY) = 4 close(4) = 0 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 4 openat(4, "dm/uuid", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) close(4) = 0 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 4 openat(AT_FDCWD, "/sys/block/sdb/dev", O_RDONLY|O_CLOEXEC) = 5 close(5) = 0 close(4) = 0 openat(AT_FDCWD, "/dev/sdb1", O_RDONLY|O_CLOEXEC) = 4 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 5 openat(5, "dm/uuid", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) close(5) = 0 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 5 openat(AT_FDCWD, "/sys/block/sdb/dev", O_RDONLY|O_CLOEXEC) = 6 close(6) = 0 close(5) = 0 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 5 close(5) = 0 close(4) = 0 openat(AT_FDCWD, "/sys/block/sdb/queue/zoned", O_RDONLY) = 4 close(4) = 0 openat(AT_FDCWD, "/dev/sdb1", O_RDWR) = 4 openat(AT_FDCWD, "/dev/sdb1", O_RDONLY|O_CLOEXEC) = 5 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 6 openat(6, "dm/uuid", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) close(6) = 0 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 6 openat(AT_FDCWD, "/sys/block/sdb/dev", O_RDONLY|O_CLOEXEC) = 7 close(7) = 0 close(6) = 0 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 6 close(6) = 0 close(5) = 0 openat(AT_FDCWD, "/sys/block/sdb/queue/zoned", O_RDONLY) = 5 close(5) = 0 openat(AT_FDCWD, "/dev/sdb1", O_RDWR) = 5 openat(AT_FDCWD, "/dev/sdb1", O_RDONLY|O_CLOEXEC) = 6 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 7 openat(7, "dm/uuid", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) close(7) = 0 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 7 openat(AT_FDCWD, "/sys/block/sdb/dev", O_RDONLY|O_CLOEXEC) = 8 close(8) = 0 close(7) = 0 openat(AT_FDCWD, "/sys/dev/block/8:17", O_RDONLY|O_CLOEXEC) = 7 close(7) = 0 close(6) = 0 openat(AT_FDCWD, "/sys/block/sdb/queue/zoned", O_RDONLY) = 6 close(6) = 0 close(3) = 0 close(5) = 0 close(4) = 0 openat(AT_FDCWD, "/dev/btrfs-control", O_RDWR) = 3 close(3) = 0 +++ exited with 0 +++

Feb 02 08:25:44 gmananth-20240125-1442 systemd-udevd[1106]: inotify event: 8 for /dev/sdb1 Feb 02 08:25:44 gmananth-20240125-1442 systemd-udevd[1106]: device /dev/sdb1 closed, synthesising 'change' Feb 02 08:25:44 gmananth-20240125-1442 systemd-udevd[1106]: seq 3293 queued, 'change' 'block' Feb 02 08:25:44 gmananth-20240125-1442 systemd-udevd[1106]: Validate module index Feb 02 08:25:44 gmananth-20240125-1442 systemd-udevd[1106]: Check if link configuration needs reloading. Feb 02 08:25:44 gmananth-20240125-1442 systemd-udevd[1106]: seq 3293 forked new worker [36349] Feb 02 08:25:44 gmananth-20240125-1442 systemd-udevd[36349]: seq 3293 running Feb 02 08:25:44 gmananth-20240125-1442 systemd-udevd[36349]: removing watch on '/dev/sdb1' Feb 02 08:25:44 gmananth-20240125-1442 systemd-udevd[36349]: LINK 'disk/by-id/scsi-360c098be9b3d46079481cfe39aae948e-part1' /usr/lib/udev/rules.d/60-persistent-storage.rules:53 Feb 02 08:25:44 gmananth-20240125-1442 systemd-udevd[36349]: LINK 'disk/by-path/ip-169.254.2.10:3260-iscsi-iqn.2015-12.com.oracleiaas:c4784f03-7828-4744-990c-3cf97a415221-lun-4-part1' /usr/lib/udev/rules.d/60-persistent-storage.rules:76 Feb 02 08:25:44 gmananth-20240125-1442 systemd-udevd[36349]: IMPORT builtin 'blkid' /usr/lib/udev/rules.d/60-persistent-storage.rules:90 Feb 02 08:25:44 gmananth-20240125-1442 systemd-udevd[36349]: probe /dev/sdb1 raid offset=0 Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: LINK 'disk/by-id/wwn-0x60c098be9b3d46079481cfe39aae948e-part1' /usr/lib/udev/rules.d/60-persistent-storage.rules:101 Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: LINK 'disk/by-partuuid/23ea5dc6-01' /usr/lib/udev/rules.d/60-persistent-storage.rules:104 Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: LINK 'disk/by-id/scsi-360c098be9b3d46079481cfe39aae948e-part1' /usr/lib/udev/rules.d/63-scsi-sg3_symlink.rules:17 Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: IMPORT 'probe-bcache -o udev /dev/sdb1' /usr/lib/udev/rules.d/69-bcache.rules:16 Feb 02 08:25:49 gmananth-20240125-1442 [36350]: starting 'probe-bcache -o udev /dev/sdb1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: Process 'probe-bcache -o udev /dev/sdb1' succeeded. Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: PROGRAM '/usr/lib/udev/updateocideviceparameters sdb1' /usr/lib/udev/rules.d/99-systemoci-device.rules:4 Feb 02 08:25:49 gmananth-20240125-1442 [36351]: starting '/usr/lib/udev/updateocideviceparameters sdb1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: Process '/usr/lib/udev/updateocideviceparameters sdb1' succeeded. Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: PROGRAM '/usr/lib/udev/ociudevpersistentnaming sdb1' /usr/lib/udev/rules.d/99-systemoci-persistent-names.rules:10 Feb 02 08:25:49 gmananth-20240125-1442 [36355]: starting '/usr/lib/udev/ociudevpersistentnaming sdb1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: '/usr/lib/udev/ociudevpersistentnaming sdb1'(out) 'oracleoci/oraclevdd1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: Process '/usr/lib/udev/ociudevpersistentnaming sdb1' succeeded. Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: LINK 'oracleoci/oraclevdd1' /usr/lib/udev/rules.d/99-systemoci-persistent-names.rules:10 Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: handling device node '/dev/sdb1', devnum=b8:17, mode=0600, uid=0, gid=0 Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: preserve already existing symlink '/dev/block/8:17' to '../sdb1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: found 'b8:17' claiming '/run/udev/links/\x2fdisk\x2fby-id\x2fscsi-360c098be9b3d46079481cfe39aae948e-part1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: creating link '/dev/disk/by-id/scsi-360c098be9b3d46079481cfe39aae948e-part1' to '/dev/sdb1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: preserve already existing symlink '/dev/disk/by-id/scsi-360c098be9b3d46079481cfe39aae948e-part1' to '../../sdb1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: found 'b8:17' claiming '/run/udev/links/\x2fdisk\x2fby-id\x2fwwn-0x60c098be9b3d46079481cfe39aae948e-part1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: creating link '/dev/disk/by-id/wwn-0x60c098be9b3d46079481cfe39aae948e-part1' to '/dev/sdb1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: preserve already existing symlink '/dev/disk/by-id/wwn-0x60c098be9b3d46079481cfe39aae948e-part1' to '../../sdb1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: found 'b8:17' claiming '/run/udev/links/\x2fdisk\x2fby-partuuid\x2f23ea5dc6-01' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: creating link '/dev/disk/by-partuuid/23ea5dc6-01' to '/dev/sdb1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: preserve already existing symlink '/dev/disk/by-partuuid/23ea5dc6-01' to '../../sdb1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: found 'b8:17' claiming '/run/udev/links/\x2fdisk\x2fby-path\x2fip-169.254.2.10:3260-iscsi-iqn.2015-12.com.oracleiaas:c4784f03-7828-4744-990c-3cf97a415221-lun-4-part1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: creating link '/dev/disk/by-path/ip-169.254.2.10:3260-iscsi-iqn.2015-12.com.oracleiaas:c4784f03-7828-4744-990c-3cf97a415221-lun-4-part1' to '/dev/sdb1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: preserve already existing symlink '/dev/disk/by-path/ip-169.254.2.10:3260-iscsi-iqn.2015-12.com.oracleiaas:c4784f03-7828-4744-990c-3cf97a415221-lun-4-part1' to '../../sdb1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: found 'b8:17' claiming '/run/udev/links/\x2foracleoci\x2foraclevdd1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: creating link '/dev/oracleoci/oraclevdd1' to '/dev/sdb1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: preserve already existing symlink '/dev/oracleoci/oraclevdd1' to '../sdb1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: adding watch on '/dev/sdb1' Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: seq 3293 processed Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[1106]: cleanup idle workers Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: Unload module index Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[36349]: Unloaded link configuration context. Feb 02 08:25:49 gmananth-20240125-1442 systemd-udevd[1106]: worker [36349] exited Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[1106]: inotify event: 8 for /dev/sdb1 Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[1106]: device /dev/sdb1 closed, synthesising 'change' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[1106]: seq 3294 queued, 'change' 'block' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[1106]: Validate module index Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[1106]: Check if link configuration needs reloading. Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[1106]: seq 3294 forked new worker [36365] Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: seq 3294 running Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: removing watch on '/dev/sdb1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[1106]: inotify event: 8000 for /dev/sdb1 Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[1106]: removing watch on '/dev/sdb1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: LINK 'disk/by-id/scsi-360c098be9b3d46079481cfe39aae948e-part1' /usr/lib/udev/rules.d/60-persistent-storage.rules:53 Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: LINK 'disk/by-path/ip-169.254.2.10:3260-iscsi-iqn.2015-12.com.oracleiaas:c4784f03-7828-4744-990c-3cf97a415221-lun-4-part1' /usr/lib/udev/rules.d/60-persistent-storage.rules:76 Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: IMPORT builtin 'blkid' /usr/lib/udev/rules.d/60-persistent-storage.rules:90 Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: probe /dev/sdb1 raid offset=0 Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: LINK 'disk/by-id/wwn-0x60c098be9b3d46079481cfe39aae948e-part1' /usr/lib/udev/rules.d/60-persistent-storage.rules:101 Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: LINK 'disk/by-partuuid/23ea5dc6-01' /usr/lib/udev/rules.d/60-persistent-storage.rules:104 Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: LINK 'disk/by-id/scsi-360c098be9b3d46079481cfe39aae948e-part1' /usr/lib/udev/rules.d/63-scsi-sg3_symlink.rules:17 Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: IMPORT 'probe-bcache -o udev /dev/sdb1' /usr/lib/udev/rules.d/69-bcache.rules:16 Feb 02 08:25:55 gmananth-20240125-1442 [36366]: starting 'probe-bcache -o udev /dev/sdb1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: Process 'probe-bcache -o udev /dev/sdb1' succeeded. Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: PROGRAM '/usr/lib/udev/updateocideviceparameters sdb1' /usr/lib/udev/rules.d/99-systemoci-device.rules:4 Feb 02 08:25:55 gmananth-20240125-1442 [36367]: starting '/usr/lib/udev/updateocideviceparameters sdb1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: Process '/usr/lib/udev/updateocideviceparameters sdb1' succeeded. Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: PROGRAM '/usr/lib/udev/ociudevpersistentnaming sdb1' /usr/lib/udev/rules.d/99-systemoci-persistent-names.rules:10 Feb 02 08:25:55 gmananth-20240125-1442 [36371]: starting '/usr/lib/udev/ociudevpersistentnaming sdb1' Feb 02 08:25:55 gmananth-20240125-1442 kernel: BTRFS: device fsid 142949e8-4fbf-40c9-82af-9e19a15566b9 devid 1 transid 6 /dev/sdb1 scanned by mkfs.btrfs (36348) Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: '/usr/lib/udev/ociudevpersistentnaming sdb1'(out) 'oracleoci/oraclevdd1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: Process '/usr/lib/udev/ociudevpersistentnaming sdb1' succeeded. Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: LINK 'oracleoci/oraclevdd1' /usr/lib/udev/rules.d/99-systemoci-persistent-names.rules:10 Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: handling device node '/dev/sdb1', devnum=b8:17, mode=0600, uid=0, gid=0 Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: preserve already existing symlink '/dev/block/8:17' to '../sdb1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: found 'b8:17' claiming '/run/udev/links/\x2fdisk\x2fby-id\x2fscsi-360c098be9b3d46079481cfe39aae948e-part1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: creating link '/dev/disk/by-id/scsi-360c098be9b3d46079481cfe39aae948e-part1' to '/dev/sdb1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: preserve already existing symlink '/dev/disk/by-id/scsi-360c098be9b3d46079481cfe39aae948e-part1' to '../../sdb1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: found 'b8:17' claiming '/run/udev/links/\x2fdisk\x2fby-id\x2fwwn-0x60c098be9b3d46079481cfe39aae948e-part1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: creating link '/dev/disk/by-id/wwn-0x60c098be9b3d46079481cfe39aae948e-part1' to '/dev/sdb1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: preserve already existing symlink '/dev/disk/by-id/wwn-0x60c098be9b3d46079481cfe39aae948e-part1' to '../../sdb1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: found 'b8:17' claiming '/run/udev/links/\x2fdisk\x2fby-partuuid\x2f23ea5dc6-01' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: creating link '/dev/disk/by-partuuid/23ea5dc6-01' to '/dev/sdb1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: preserve already existing symlink '/dev/disk/by-partuuid/23ea5dc6-01' to '../../sdb1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: found 'b8:17' claiming '/run/udev/links/\x2fdisk\x2fby-path\x2fip-169.254.2.10:3260-iscsi-iqn.2015-12.com.oracleiaas:c4784f03-7828-4744-990c-3cf97a415221-lun-4-part1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: creating link '/dev/disk/by-path/ip-169.254.2.10:3260-iscsi-iqn.2015-12.com.oracleiaas:c4784f03-7828-4744-990c-3cf97a415221-lun-4-part1' to '/dev/sdb1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: preserve already existing symlink '/dev/disk/by-path/ip-169.254.2.10:3260-iscsi-iqn.2015-12.com.oracleiaas:c4784f03-7828-4744-990c-3cf97a415221-lun-4-part1' to '../../sdb1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: found 'b8:17' claiming '/run/udev/links/\x2foracleoci\x2foraclevdd1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: creating link '/dev/oracleoci/oraclevdd1' to '/dev/sdb1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: preserve already existing symlink '/dev/oracleoci/oraclevdd1' to '../sdb1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: adding watch on '/dev/sdb1' Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: seq 3294 processed Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[1106]: cleanup idle workers Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: Unload module index Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[36365]: Unloaded link configuration context. Feb 02 08:25:55 gmananth-20240125-1442 systemd-udevd[1106]: worker [36365] exited

From: Qu Wenruo @.> Sent: Friday, February 2, 2024 2:30 PM To: kdave/btrfs-progs @.> Cc: Gautham Ananthakrishna @.>; Mention @.> Subject: Re: [kdave/btrfs-progs] UUID not consistently available in output of lsblk after formatting (Issue #734)

Any strace and journal for the reproduced bug?

As the new branch doesn't go flock() anymore, it just expands the lifespan to make sure all writeable fds only get closed when the fs is fully committed.

In that case, the inotify should properly went to udev and the scan can get the correct UUID.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/kdave/btrfs-progs/issues/734*issuecomment-1923362293__;Iw!!ACWV5N9M2RV99hQ!KL25UpHw2jYdpAWLw2A8VyLn5c3n3b8mMhuRyIeLWMVbgaPXIAyXIqfC23U2f4YBBjMxJsgg9HCLOtJZUOENwZJoCIlULFixyQ$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/BDUKY3EPM2OWMJ2SUCEFTWTYRSTHLAVCNFSM6AAAAABCERH4M2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRTGM3DEMRZGM__;!!ACWV5N9M2RV99hQ!KL25UpHw2jYdpAWLw2A8VyLn5c3n3b8mMhuRyIeLWMVbgaPXIAyXIqfC23U2f4YBBjMxJsgg9HCLOtJZUOENwZJoCInk4RPBHg$. You are receiving this because you were mentioned.Message ID: @.***>

adam900710 commented 5 months ago

@gtmoth Backporting to v5.15, no wonder it won't work.

The problem of v5.15 is, it lacks the commit b2a1be83b85f183f582f07fac0279d40fc62a4db, which is the main part to properly expand the fd's lifespan. (although it missed the one I'm fixing).

This can be proven by the strace output:

openat(AT_FDCWD, "/dev/sdb1", O_RDWR|O_EXCL) = 3 close(3) = 0

This shows a very early device writeable open and close, which is already triggering the udev scan pre-maturely.

Thus it's strongly recommended to test the branch as is.

For the backport effort, at least that commit b2a1be83b85f183f582f07fac0279d40fc62a4db would be also required, then the new fix in the stray_fd branches. Even with those fixes, I still doubt there may be some missing parts.

BTW, if you take a full strace (including all the pwrites), you can verify the behavior. The correct behavior would be, all the writeable device fds are closed after writes() are done.

gtmoth commented 4 months ago

@adm900710. I backported 3 required patches before this one onto 5.15.1. btrfs-progs: rename struct open_ctree_flags to open_ctree_args btrfs-progs: mkfs: keep file descriptors open during whole time btrfs-progs: mkfs: run device preparation in parallel With this, I was not able to reproduce. Thanks.

gtmoth commented 4 months ago

Key observations in my testing:

  1. The strace output showed two file descriptors with O_RDWR flag.
  2. Both these fd's were closed only after all pwrite64 and fsync calls.
  3. There was only one inotify 8 event in journalctrl, no inotify 8000 was seen.

I would like to know when this would be merged ?

adam900710 commented 4 months ago

Great you can verify the new fix is correct, and backport it to v5.15.

For the merging, I hope the next release would include that.

gtmoth commented 4 months ago

Just to clear doubts, I have used the latest commit 100fd6903 "btrfs-progs: fix the stray fd close in open_ctree_fs_info()" from https://github.com/adam900710/btrfs-progs/tree/stray_fd and backported it along with three other dependent patches.

btrfs-progs: rename struct open_ctree_flags to open_ctree_args btrfs-progs: mkfs: keep file descriptors open during whole time btrfs-progs: mkfs: run device preparation in parallel

I have backported all these 4 patches onto 5.15.1-0. The test results/observations I posted earlier are all from this effort only.

kdave commented 4 months ago

Thank you all, the fixes are now in devel and will be released in 6.7.1 soon-ish.