kdave / btrfs-progs

Development of userspace BTRFS tools
GNU General Public License v2.0
563 stars 245 forks source link

lazy unmount issues #876

Open axet opened 3 months ago

axet commented 3 months ago

Hello!

Lazy unmount happens automatically when device removed by (udev or systemd) when USB sticks with btrfs on got disconnected.

As result if any process holding any files (cwd, root, fd) from target lazy unmounted fs it will cause to lock down and as result will required to reboot the system.

For ext4 you can remount it second time no problem while previous lazy unmount keep busy waiting. But btrfs does not allow that.

Simple solution would be looking for opened files using 'lsof' but because of lazy unmount all path are cut that makes it impossible to find which process is holding btrfs file system and underlaying (md-luks device).

That is very old and known issues by users. You can google it.

It would be possible to find lazy unmouted process using lsof, but you have to know device major and minor number. For all btrfs mounts it is virtual devices with major == 0. When file system is mounted you can run 'stat /media/root/LABEL' to see device minor number which can be used to identify process working with that fs. But if device got disconnected that would be not possible to know device minor number.

Right now here is no way fixing that only option is to reboot.

It would be possible to fix if /sys/fs/btrfs/*/device_minor property will show device minor number used to mount filesystem.

In that case 'lsof | grep 0,MINOR' will show all processes holding btrfs filesystem.

Zygo commented 3 months ago

Each btrfs subvol has its own minor number. As a result, the proposed device_minor property would necessarily be a list of minor numbers for each filesystem, assuming that it's tracking only open files (the complete set of minor numbers for all subvols in the filesystem is not available after disconnection, but it's also not necessary for completing a lazy umount).

Other possible solutions for the original problem:

  1. disable the lazy umount for btrfs filesystems, so that lsof continues to work after devices are disconnected (that would be a systemd or udev rule change, depending on what is triggering the lazy umount)
  2. enable btrfs to forcibly disconnect all open FDs referencing the filesystem, leaving all existing FDs in EBADFD state where the only valid operation is close(). This decrements the reference count to zero, at which point the filesystem can be fully umounted (something like echo 1 > /sys/fs/btrfs/$uuid/force_umount to do it within btrfs, if there's no VFS mount flag for this case)
  3. enable btrfs to change UUID on a mounted filesystem to a new random value (this also supports several other ext4 use cases where distinct block devices with the same filesystem UUID need to be mountable simultaneously)
  4. enable btrfs to forcibly drop devices (this also supports several other mdadm use cases where btrfs needs to immediately stop using a device and release it)

IMHO option 3 and 4 are the easiest to implement within btrfs, and support the greatest number of immediately useful use cases. Option 1 requires no changes to btrfs at all. Option 2 is presented for completeness as it's the solution other operating systems use, but there would be challenges implementing it on a Linux kernel.

Forza-tng commented 3 months ago

1, seems reasonable as it doesn't hide the issue. Lazy unmounts have caused headaches for me in the past. This leads me to option 2, which I think is the best solution. It is what users expect should be possible, like with the USB device or other portable media.

Option 3 seems dangerous. What happens if the open fd starts writing data? How would existing tools and workflow handle a suddenly new UUID?

Not sure about option 4. What device should be dropped if a process is holding the filesystem mounted?

Zygo commented 3 months ago

Option 3 seems dangerous. What happens if the open fd starts writing data?

Option 4 is there to address some of the risks of option 3 (none of the options are mutually exclusive).

If the underlying device is disconnected, a FD will get IO errors whenever it touches the disconnected device, and btrfs will force read-only at the next transaction commit. It is up to each process to close all file descriptors on the filesystem, and the last one to do so completes the umount. If a process with an open FD never exits and never closes the FD, the filesystem is never umounted.

If the underlying device is not disconnected, then the open FD on the lazy-umounted filesystem behaves like a FD on a normally-mounted filesystem. In that case, we want things to continue as they do now, with the btrfs UUID locked against allowing any other device to mount with the same fs UUID.

How would existing tools and workflow handle a suddenly new UUID?

The same way they currently handle running mkfs.ext4 on a block device, presumably. If the device is disconnected, most existing tools are certainly not going to work since nearly any operation on the lazy-umounted filesystem will fail. They will have to reacquire new block devices and a new mount point when the filesystem's block devices are reconnected (also identical to the way this happens in ext4).

Not sure about option 4. What device should be dropped if a process is holding the filesystem mounted?

Whichever device is specified through the sysfs interface, e.g. echo 1 > /sys/fs/btrfs/$uuid/devinfo/4/drop.

This could be used as a proactive measure to ensure a filesystem that was lazy-umounted and UUID-changed does not have any lingering connections to physical devices. In the normal USB disconnect case this isn't a problem, but in other cases, such as when a dm-crypt layer is inserted in between, or when the user mistakenly believes the filesystem is disconnected when it is not, the deletion events from the device layer might not reach the filesystem. So you'd do something like

echo 1 | tee /sys/fs/btrfs/$uuid/devinfo/*/drop
echo random | tee /sys/fs/btrfs/$uuid/reset_uuid

that would disconnect all the devices and change the uuid of the filesystem. This would allow the same filesystem to be mounted again when the devices are reconnected, while not changing anything else about lazy umounts (neither the kernel nor systemd/udevd).

Note that after applying options 3 and 4, the original filesystem remains mounted, but with no accessible mount points, and with no readable or writable devices. It's going to dump a ton of noise into the kernel log until the last FDs on the filesystem are closed. lsof and similar tools will not be able to resolve the paths to these files, but it will be possible to mount the filesystem if its block devices are reconnected, similar to what happens to ext4 in this situation.