nix-community / disko

Declarative disk partitioning and formatting using nix [maintainers=@Lassulus @Enzime]
MIT License
1.73k stars 187 forks source link

random failures during script "cannot open /dev/disk/by-partlabel/disk-sda-boot" #739

Open ghostbuster91 opened 1 month ago

ghostbuster91 commented 1 month ago

Following disko script:

{ disks ? [ "/dev/sda" ], ... }: {
  disko.devices = {
    disk = {
      sda = {
        type = "disk";
        device = builtins.elemAt disks 0;
        content = {
          type = "gpt";
          partitions = {
            grub = {
              size = "1M";
              type = "EF02";
              priority = 1;
            };
            boot = {
              size = "512M";
              content = {
                type = "filesystem";
                format = "vfat";
                mountpoint = "/boot";
              };
              priority = 2;
              hybrid.mbrBootableFlag = true;
            };
            root = {
              size = "128G";
              content = {
                type = "zfs";
                pool = "rpool1";
              };
              priority = 4;
            };
          };
        };
      };
    };
    zpool = {
      rpool1 =
        let
          unmountable = { type = "zfs_fs"; };
          filesystem = mountpoint: {
            type = "zfs_fs";
            options = {
              canmount = "noauto";
              inherit mountpoint;
            };
            inherit mountpoint;
          };
        in
        {
          type = "zpool";

          rootFsOptions = {
            compression = "lz4";
            "com.sun:auto-snapshot" = "false";
            canmount = "off";
            xattr = "sa";
            atime = "off";
          };
          options = {
            ashift = "12";
            autotrim = "on";
            compatibility = "grub2";
          };
          datasets = {
            "local" = unmountable;
            "local/root" = filesystem "/" // {
              postCreateHook = "zfs snapshot rpool1/local/root@blank";
            };
            "local/nix" = filesystem "/nix";
            "local/state" = filesystem "/state";

            "safe" = unmountable;
            "safe/persist" = filesystem "/persist";
          };
        };
    };
  };
}

sometimes (but very often) fails with mkfs.vfat cannot open /dev/disk/by-partlabel/disk-sda-boot (or other similar error). This happens when the disko format script refers to a disk by by-partlabel lookup (so also when calling zpool create or mkswap etc). What is weird is that these entries exist (verified by manually inspecting these directories with ls)

Manually calling the failed command and restarting disko script eventually passes.

This only happens on a real machine never on virtual one. Originally reported in #735

Lassulus commented 1 month ago

maybe some timing issues, can you post the scripts output on such a failed attempt? maybe sleeping a bit before the mkfs.vat could help. you can try adding a preCreateHook = "sleep 3"; next to the type = "filesystem";

ghostbuster91 commented 1 month ago

maybe some timing issues

I think so too.

can try adding a preCreateHook = "sleep 3"; next to the type = "filesystem";

This looks very promising, thanks :)

can you post the scripts output on such a failed attempt?

I will reproduce the issue as I need to test the preCreateHook suggestion anyway and then I will post the whole output.