nix-community / disko

Declarative disk partitioning and formatting using nix [maintainers=@Lassulus @Enzime @iFreilicht]
MIT License
1.88k stars 199 forks source link

confusing disk label applied when using ZFS #621

Open nipsy opened 6 months ago

nipsy commented 6 months ago

I noticed this recently when using disko for the first time with a zpool mirror configuration. I haven't seen it cause a problem yet, but it does seem like it could be confusing and possibly problematic down the road. After a brand new installation on completely blkdiscard'ed disks before rebooting, I see the following:

[nixos@nixos:~]$ ls -alF /dev/disk/by-label
total 0
drwxr-xr-x  2 root root 180 May  4 18:37 ./
drwxr-xr-x 11 root root 220 May  4 18:36 ../
lrwxrwxrwx  1 root root  10 May  4 18:30 EFIBOOT -> ../../sda2
lrwxrwxrwx  1 root root  15 May  4 18:37 ESP1 -> ../../nvme1n1p1
lrwxrwxrwx  1 root root  15 May  4 18:37 ESP2 -> ../../nvme0n1p1
lrwxrwxrwx  1 root root  10 May  4 18:37 nixos-minimal-24.05-x86_64 -> ../../sda1
lrwxrwxrwx  1 root root  15 May  4 18:37 rpool -> ../../nvme0n1p3
lrwxrwxrwx  1 root root  15 May  4 18:37 swap1 -> ../../nvme1n1p2
lrwxrwxrwx  1 root root  15 May  4 18:37 swap2 -> ../../nvme0n1p2

All of those look right except for rpool. I'm guessing as part of the work in #29 and #30, ZFS ends up with a label regardless of whether that makes any sense. Having said that though, there seems to be some logic in place for ZFS since rpool only appears once despite nvme1n1p3 being the other half of the mirror. Or maybe that's just the result of having two separate partitions both with type = "zfs" and pool = "rpool" where it's a first/last declared wins sort of scenario.

My disko configuration follows:

{
  disko.devices = {
    disk = {
      nvme0n1 = {
        type = "disk";
        device = "/dev/disk/by-id/nvme-WDC_WDS100T2B0C-00PXH0_203822801246";
        content = {
          type = "gpt";
          partitions = {
            ESP = {
              size = "1G";
              type = "EF00";
              content = {
                type = "filesystem";
                format = "vfat";
                mountpoint = "/efiboot/efi1";
                mountOptions = [ "defaults" ];
                extraArgs = [ "-n ESP1" ];
              };
            };
            swap = {
              size = "16G";
              type = "8200";
              content = {
                type = "swap";
                extraArgs = [ "-L swap1" ];
              };
            };
            zfs = {
              size = "100%";
              content = {
                type = "zfs";
                pool = "rpool";
              };
            };
          };
        };
      };
      nvme1n1 = {
        type = "disk";
        device = "/dev/disk/by-id/nvme-Corsair_MP600_MICRO_A828B35000EQBA";
        content = {
          type = "gpt";
          partitions = {
            ESP = {
              size = "1G";
              type = "EF00";
              content = {
                type = "filesystem";
                format = "vfat";
                mountpoint = "/efiboot/efi2";
                mountOptions = [ "defaults" ];
                extraArgs = [ "-n ESP2" ];
              };
            };
            swap = {
              size = "16G";
              type = "8200";
              content = {
                type = "swap";
                extraArgs = [ "-L swap2" ];
              };
            };
            zfs = {
              size = "100%";
              content = {
                type = "zfs";
                pool = "rpool";
              };
            };
          };
        };
      };
    };
    zpool = {
      rpool = {
        mode = "mirror";
        type = "zpool";
        rootFsOptions = {
          acltype = "posixacl";
          canmount = "off";
          compression = "on";
          dnodesize = "auto";
          relatime = "on";
          xattr = "sa";
        };
        options = {
          ashift = "12";
          autotrim = "on";
        };
        datasets = {
          "local" = {
            type = "zfs_fs";
            options.mountpoint = "none";
          };
          "local/root" = {
            type = "zfs_fs";
            options.mountpoint = "legacy";
            mountpoint = "/";
          };
          "local/nix" = {
            type = "zfs_fs";
            options = {
              atime = "off";
              mountpoint = "legacy";
            };
            mountpoint = "/nix";
          };
          "user" = {
            type = "zfs_fs";
            options.mountpoint = "none";
          };
          "user/home" = {
            type = "zfs_fs";
            options.mountpoint = "legacy";
            mountpoint = "/home";
          };
          "user/home/root" = {
            type = "zfs_fs";
            options.mountpoint = "legacy";
            mountpoint = "/root";
          };
          "user/home/nipsy" = {
            type = "zfs_fs";
            options.mountpoint = "legacy";
            mountpoint = "/home/nipsy";
          };
        };
      };
    };
  };
}
misuzu commented 6 months ago

Check ls -alF /dev/disk/by-partlabel

nipsy commented 6 months ago

Sure, something more sane:

# ls -alF /dev/disk/by-partlabel
total 0
drwxr-xr-x 2 root root 160 May  7 19:21 ./
drwxr-xr-x 9 root root 180 May  7 19:21 ../
lrwxrwxrwx 1 root root  15 May  7 19:21 disk-nvme0n1-ESP -> ../../nvme1n1p1
lrwxrwxrwx 1 root root  15 May  7 19:21 disk-nvme0n1-swap -> ../../nvme1n1p2
lrwxrwxrwx 1 root root  15 May  7 19:21 disk-nvme0n1-zfs -> ../../nvme1n1p3
lrwxrwxrwx 1 root root  15 May  7 19:21 disk-nvme1n1-ESP -> ../../nvme0n1p1
lrwxrwxrwx 1 root root  15 May  7 19:21 disk-nvme1n1-swap -> ../../nvme0n1p2
lrwxrwxrwx 1 root root  15 May  7 19:21 disk-nvme1n1-zfs -> ../../nvme0n1p3

Having said that, I'm not using anything other than the default of /dev/disk/by-id for boot.zfs.devNodes, so I have this:

# zpool status -v
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:08 with 0 errors on Tue May  7 16:34:56 2024
config:

        NAME                                                 STATE     READ WRITE CKSUM
        rpool                                                ONLINE       0     0     0
          mirror-0                                           ONLINE       0     0     0
            nvme-eui.e8238fa6bf530001001b448b49a9cba9-part3  ONLINE       0     0     0
            nvme-eui.6479a7888ac0032f-part3                  ONLINE       0     0     0

errors: No known data errors

So all of that is as expected I think.

But as stated in the original report, I'm really just worried about the aesthetics of /dev/disk/by-label/rpool pointing at a single partition of a mirror somewhat erroneously or at least misleadingly. Not a huge deal necessarily, but I found it odd as in, this doesn't seem right to me.

I actually like the disko named by-partlabel paths. I might switch boot.zfs.devNodes to that directory instead just for the nicer looking names. Having said that, it would be really nice if maybe somehow the by-label name could be something like rpool-[[:digit:]]+ per defined vdev in the zpool. That would at least make it consistent with the partlabel scheme to some extent.

iFreilicht commented 1 month ago

I can confirm this issue. I have a similar setup:

      # ZFS storage pool, disk 1
      tank_1 = {
        device = "/dev/disk/by-id/ata-ST4000VN006-3CW104_WW61E4ZD";
        type = "disk";
        content = {
          type = "gpt";
          partitions = {
            tank_1 = {
              size = "100%";
              content = {
                type = "zfs";
                pool = "tank";
              };
            };
          };
        };
      };
      # ZFS storage pool, disk 2
      tank_2 = {
        device = "/dev/disk/by-id/ata-ST4000VN006-3CW104_WW61CNDR";
        type = "disk";
        content = {
          type = "gpt";
          partitions = {
            tank_2 = {
              size = "100%";
              content = {
                type = "zfs";
                pool = "tank";
              };
            };
          };
        };
      };

And can see the same result in by-id:

$ ll /dev/disk/by-label
total 0
lrwxrwxrwx 1 root root  10 Sep 28 22:03 BOOT -> ../../sdf1
...
lrwxrwxrwx 1 root root  10 Sep 28 22:03 tank -> ../../sda1

The issue becomes clearer when looking at all the filesystem labels:

$ lsblk --output NAME,LABEL
NAME                                                LABEL
sda                                                 
└─sda1                                              tank
sdb                                                 
└─sdb1                                              scratch
sdc                                                 
└─sdc1                                              tank
sdf                                                 
├─sdf1                                              BOOT
└─sdf2                                              nixos

Both partitions have the same label, and so when udev creates the links, which ever comes second overrides the first. On my NixOS install, this rule is probably the culprit:

$ rg "disk/by-label" /etc/udev/rules.d 
/etc/udev/rules.d/13-dm-disk.rules
41:ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{ID_FS_LABEL_ENC}=="?*", SYMLINK+="disk/by-label/$env{ID_FS_LABEL_ENC}"

However, we never specify this label, OpenZFS just adds it when we run zpool create. So instead of interfering with that, we could try to create our own udev rule for every zpool that removes the symlinks from /dev/disk/by-label again.

Something like this, perhaps:

LABEL="remove_tank_symlink"
ENV{ID_FS_LABEL}=="tank", RUN+="/usr/bin/env rm -f /dev/disk/by-label/tank"

This does seem like a pretty hacky solution, but I don't think there's another way.