storaged-project / udisks

The UDisks project provides a daemon, tools and libraries to access and manipulate disks, storage devices and technologies.
https://storaged.org/doc/udisks2-api/latest/
Other
345 stars 142 forks source link

how to handle Btrfs multiple devices on the desktop #802

Open cmurf opened 4 years ago

cmurf commented 4 years ago

Background on this issue: https://gitlab.gnome.org/GNOME/gvfs/-/issues/519 https://bugs.kde.org/show_bug.cgi?id=427092

Nautilus and Dolphin show a disk icon for each Btrfs member device, and then much user and udisks confusion ensues. Desktop environment consumers may not need physical device information at all, and instead may be better off not being aware of it. When the user clicks on the various devices multiple times, multiple mount points are created, which is unintended but also confusing and not desired.

udisksdump.txt

Instead, they need a way to handle subvolumes, perhaps as virtual device 'children'. (This may not be entirely different from LVM thin pool or Stratis pool as the parent, and its filesystems as children - if this metaphor holds - except in the level of detail.)

Filing this bug to facilitate awareness of the competing issues.

Related

768

88

libblockdev#244

vojtechtrefny commented 4 years ago

Instead, they need a way to handle subvolumes

We have a separate btrfs plugin with "advanced" btrfs functionality: http://storaged.org/doc/udisks2-api/latest/gdbus-org.freedesktop.UDisks2.Manager.BTRFS.html http://storaged.org/doc/udisks2-api/latest/gdbus-org.freedesktop.UDisks2.Filesystem.BTRFS.html

cmurf commented 4 years ago

We have a separate btrfs plugin with "advanced" btrfs functionality:

OK cool!

tbzatek commented 4 years ago

@cmurf, can you please attach udevadm info --export-db too? I'm wondering whether there are any udev properties specific to btrfs multidisk volume.

tbzatek commented 4 years ago

(This may not be entirely different from LVM thin pool or Stratis pool as the parent, and its filesystems as children - if this metaphor holds - except in the level of detail.)

For the record, the root cause of these issues is the fact that such btrfs multidisk volume members are detected as IdUsage: filesystem and thus displayed in the GUI and offered for mounting. This is a btrfs specific and creates confusion not only to upper local storage management layers, but possibly also to sysadmins working with CLI tools and not being fully aware of these specifics.

vojtechtrefny commented 4 years ago

UDev info for "multidisk" and "singledisk" volumes is the same. AFAICT only way how we can tell that two btrfs filesystems are part of the same volume is the same UUID.

$ udevadm info /dev/sde1                
P: /devices/pci0000:00/0000:00:07.0/host9/target9:0:1/9:0:1:0/block/sde/sde1
N: sde1
L: 0
S: disk/by-path/pci-0000:00:07.0-scsi-0:0:1:0-part1
S: disk/by-uuid/d986fd44-ec55-4744-b0c0-4306dcc97cb0
S: disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-1-part1
S: disk/by-partuuid/0ef75796-01
E: DEVPATH=/devices/pci0000:00/0000:00:07.0/host9/target9:0:1/9:0:1:0/block/sde/sde1
E: DEVNAME=/dev/sde1
E: DEVTYPE=partition
E: PARTN=1
E: MAJOR=8
E: MINOR=65
E: SUBSYSTEM=block
E: USEC_INITIALIZED=9525266
E: ID_SCSI=1
E: ID_VENDOR=QEMU
E: ID_VENDOR_ENC=QEMU\x20\x20\x20\x20
E: ID_MODEL=QEMU_HARDDISK
E: ID_MODEL_ENC=QEMU\x20HARDDISK\x20\x20\x20
E: ID_REVISION=2.5+
E: ID_TYPE=disk
E: ID_SERIAL=0QEMU_QEMU_HARDDISK_drive-scsi1-0-1
E: ID_SERIAL_SHORT=drive-scsi1-0-1
E: ID_BUS=scsi
E: ID_PATH=pci-0000:00:07.0-scsi-0:0:1:0
E: ID_PATH_TAG=pci-0000_00_07_0-scsi-0_0_1_0
E: ID_PART_TABLE_UUID=0ef75796
E: ID_PART_TABLE_TYPE=dos
E: ID_FS_UUID=d986fd44-ec55-4744-b0c0-4306dcc97cb0
E: ID_FS_UUID_ENC=d986fd44-ec55-4744-b0c0-4306dcc97cb0
E: ID_FS_UUID_SUB=9d86ffc7-a2d4-4b6a-8763-545efb08b295
E: ID_FS_UUID_SUB_ENC=9d86ffc7-a2d4-4b6a-8763-545efb08b295
E: ID_FS_TYPE=btrfs
E: ID_FS_USAGE=filesystem
E: ID_PART_ENTRY_SCHEME=dos
E: ID_PART_ENTRY_UUID=0ef75796-01
E: ID_PART_ENTRY_TYPE=0x83
E: ID_PART_ENTRY_NUMBER=1
E: ID_PART_ENTRY_OFFSET=2048
E: ID_PART_ENTRY_SIZE=2095104
E: ID_PART_ENTRY_DISK=8:64
E: DM_MULTIPATH_DEVICE_PATH=0
E: ID_BTRFS_READY=1
E: DEVLINKS=/dev/disk/by-path/pci-0000:00:07.0-scsi-0:0:1:0-part1 /dev/disk/by-uuid/d986fd44-ec55-4744-b0c0-4306dcc97cb0 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-1-part1 /dev/disk/by-partuuid/0ef75796-01
E: TAGS=:systemd:
vojtechtrefny commented 4 years ago

We can add some additional functions and/or properties to the btrfs plugin, but I don't see how we could add something helpful to the "core" UDisks API.

vojtechtrefny commented 4 years ago

It would be really helpful to have more information in UDev database. btrfs progs already ship a very simple UDev rule so adding a a btrfs filesystem show call to it and setting some btrfs-specific properties could be an option? @cmurf

tbzatek commented 4 years ago

Opened https://github.com/kdave/btrfs-progs/issues/302 requesting at least some information published in the udev db. I believe such kind of information should be provided at the right place first as the local storage management is a layered model. Only then some upper layer like UDisks could make use of it with the benefit of all upper layers built on top of it.

cmurf commented 4 years ago

udevadminfo.txt

cmurf commented 4 years ago

I mentioned in gvfs#519 but forgot to mention here; seems that udisksd is being asked to mount by /dev node rather than by fs UUID. At least from the man page I don't see a way to reference fs UUID with udisksctl. The mount command can do it by label or uuid for any file system. I wonder if the most generic approach for mounting is to just always use label or uuid, no matter the file system.

Most interactions with btrfs file systems is mounting, and post-mount. The only thing that really needs to understand the details of all the devices is udisksd itself on behalf of a handful of sophisticated programs like partitioning agents. Maybe it'd be better if most of the time the majority of user agents are kept oblivious of the details, and just interact with either uuid/label and mount point?

tbzatek commented 4 years ago

It's more complicated than that. Kernel and udev operates on major:minor block device nodes and /dev/disk/ symlinks are just different representations of the same object. Similarly any reference to a filesystem via LABEL= or UUID= resolves to a device node. The new kernel mount API could possibly take slightly different approach, however this needs to be reflected in libmount public API.

That said having duplicate filesystem identifiers present on different block devices is just wrong. Even for multipath a single device is created (btrfs over multipath anyone?). The "universally unique identifier (UUID)" is immediately not unique anymore, causing udev to randomly overwrite symlinks in /dev/disk/ that some libraries or tools do use. When matching against udev db, either a first or a random occurrence will get used, certainly not in an persistent order. That's where having more insight to a filesystem structure exposed to a udev db is crucial to solve first.

As a first step on UDisks side it will need to be made aware of duplicate filesystem identifiers and handle them gracefully to e.g. prevent multiple mounts, mount point cleanup conflicts, etc. Perhaps just taking first occurrence from a sorted list - reasonably stable within daemon lifespan. As described in https://gitlab.gnome.org/GNOME/gvfs/-/issues/519#note_921832. That will not fix the multiple object representation for the moment.

cmurf commented 4 years ago

The new kernel mount API could possibly take slightly different approach, however this needs to be reflected in libmount public API.

I was thinking of the clients, e.g. gvfs, file managers, open/save dialogs, udisksctl. Even GNOME Disks doesn't need to interact with literal block devices most of the time, such as when mounting the file system.

That said having duplicate filesystem identifiers present on different block devices is just wrong.

Why? It's the same for mdadm multiple devices:

/dev/vda3: UUID="05c30b48-4f9f-e3da-9489-5a6703287405" UUID_SUB="18ebc747-9949-489c-f896-a47a9cdced7c" LABEL="localhost-live:root" TYPE="linux_raid_member" PARTUUID="5ce570aa-cb25-4ee6-9f5c-3fc22d54b7af"
/dev/vdb1: UUID="05c30b48-4f9f-e3da-9489-5a6703287405" UUID_SUB="a372c360-1157-e88c-a1ca-3c0be19f4ddf" LABEL="localhost-live:root" TYPE="linux_raid_member" PARTUUID="cee7279a-7c63-4c6c-8c32-1194bd16e926"

RFC 4122 doesn't require a UUID exist only once, but that at the time of creation it must be unique. A collision only occurs if the same UUID is used for different referents, in both mdadm and Btrfs cases, there's one referent. The same UUID with different UUID_SUB seems to clearly indicate each unique individual constituent part of a whole.

In the mdadm case, udev seems to export udisks specific info.

E: UDISKS_MD_MEMBER_LEVEL=raid0
E: UDISKS_MD_MEMBER_DEVICES=2

Btrfs does have number of devices in each device's superblock. That's easy for udev to get and expose to udisks, if that's what's needed. Member devices aren't per se raid, that isn't how it works on Btrfs. Instead the 'raid level' is referred to as 'profile' and the profile applies per block group, and they can be different. This information isn't part of the superblock, but is stored in a btree.

As a first step on UDisks side it will need to be made aware of duplicate filesystem identifiers and handle them gracefully to e.g. prevent multiple mounts, mount point cleanup conflicts, etc.

Allowing multiple mounts of the file system is needed to support explicitly mounting subvolumes. Such a layout has been used by Fedora for ~10 years, and is used by default starting with Fedora 33, where subvol=home is mounted at /home, and subvol=root is mounted at /. It's effectively a bind mount, except that it's possible to path resolution without it first being visible.

The thing to probably avoid is mounting the same subvolume multiple times, but this is something of an artifact or side effect of multiple /dev nodes being exposed in the GUI rather than one filesystem volume icon. Each icon is currently a /dev node and we get a mount everytime the user clicks on one of the seemingly umounted ones, even though it is mounted. A related problem happens in GNOME Disks where it shows 1 of 3 Btrfs devices as mounted, the other two are not mounted, but they are all part of the same filesystem which is mounted.

tbzatek commented 3 years ago

Why? It's the same for mdadm multiple devices:

/dev/vda3: UUID="05c30b48-4f9f-e3da-9489-5a6703287405" UUID_SUB="18ebc747-9949-489c-f896-a47a9cdced7c" LABEL="localhost-live:root" TYPE="linux_raid_member" PARTUUID="5ce570aa-cb25-4ee6-9f5c-3fc22d54b7af"
/dev/vdb1: UUID="05c30b48-4f9f-e3da-9489-5a6703287405" UUID_SUB="a372c360-1157-e88c-a1ca-3c0be19f4ddf" LABEL="localhost-live:root" TYPE="linux_raid_member" PARTUUID="cee7279a-7c63-4c6c-8c32-1194bd16e926"

RFC 4122 doesn't require a UUID exist only once, but that at the time of creation it must be unique. A collision only occurs if the same UUID is used for different referents, in both mdadm and Btrfs cases, there's one referent. The same UUID with different UUID_SUB seems to clearly indicate each unique individual constituent part of a whole.

Yes, however the mdraid components carry the ID_FS_USAGE=raid udev attribute (even for legacy mdraid superblock versions) in contrast to btrfs multidisk volumes that carry ID_FS_USAGE=filesystem. It's the combination of the filesystem usability flag and duplicate UUID that causes the problem.

In the mdadm case, udev seems to export udisks specific info.

E: UDISKS_MD_MEMBER_LEVEL=raid0
E: UDISKS_MD_MEMBER_DEVICES=2

These are own rules that we ship. The right place would be at the respective upstream projects and that's what kdave/btrfs-progs#302 should be about for btrfs (still need to follow up on that).

tbzatek commented 3 years ago

Anyway, the basic support for multiple devices to avoid creating duplicate mounts is the #838 PR.

Let's deal with btrfs subvolumes in #768.

cmurf commented 3 years ago

It's the combination of the filesystem usability flag and duplicate UUID that causes the problem.

Would it help having ID_FS_USAGE=btrfs? Or does that just make things more complicated? Nevermind, answered in btrfs-progs-302.