serokell / deploy-rs

A simple multi-profile Nix-flake deploy tool.
Other
1.42k stars 100 forks source link

systemd-initrd booted NixOS does not activate #185

Open notgne2 opened 1 year ago

notgne2 commented 1 year ago

It's the revenge of https://github.com/serokell/deploy-rs/issues/31

There is this offendingrelated code in nixpkgs - https://github.com/NixOS/nixpkgs/blob/3928cfa27d9925f9fbd1d211cf2549f723546a81/nixos/modules/system/boot/systemd/initrd.nix#L483-L490

The -x check will fail as in a real NixOS closure it is a file, but in ours it is actually a symlink to that file

(-r-xr-xr-x 1 root root 4.5K Dec 31 1969 prepare-root vs lrwxrwxrwx 1 root root 103 Dec 31 1969 prepare-root -> /nix/store/8aicg4wzivp50jvshd8ix262vy00pnyf-nixos-system-peppaframe-23.05.20221222.652e92b/prepare-root)

This results in seeing this message, and a system that will not function correctly (booting will complete, services will run, but /run/current-system etc will not get populated).

Dec 26 16:09:08 localhost initrd-nixos-activation-start[522]: /nix/store/lisipr5yinsb3xfbsid8fdrhgap8c3r0-activatable-nixos-system-peppaframe-23.05.20221222.652e92b does not look like a NixOS installation - not activating

There is no current workaround, besides not using boot.initrd.systemd (but that has other costs, such as no plymouth luks prompt). While this is currently avoidable, it sounds like this may become the default in the future, rendering deploy-rs completely unusable if this is not resolved.

notgne2 commented 1 year ago

Actually, I have no idea what's going on here... -x works fine it seems? It actually even goes as far as checking the symlink target to see if it's executable. That check should be working fine in theory, but something is failing in practice, or is it...

I actually did a quick test, the log message example above is real, and the path it mentions still exists, so I even gave it a quick test directly, and yup, that check won't fail and it should boot fine, although, err, well it definitely didn't.

notgne2 commented 1 year ago

Wondering now if it has something to do with conservative copying? I don't know how the initrd stuff works, although there exists a property storePaths listing store paths to copy, and I imagine there is a chance that the real system profile isn't in it, and therefore the target of the symlink is genuinely missing at the point that the log-line occurred.

ElvishJerricco commented 1 year ago

@notgne2 The system profile is not copied into intird at all. That is on the real root fs. systemd-initrd just contains the software necessary for mounting file systems that are fsNeededForBoot, runs the activation script in a chroot, and executes the stage 2 systemd manager in the new root.

ElvishJerricco commented 1 year ago

Yea the problem is very likely just that prepare-root being a symlink causes -x to check the symlink target, but since it isn't in the chroot, it doesn't actually see the target and it doesn't work. We could fix this by using chroot /sysroot $wherever/bin/realpath on the path before using -x on it.

notgne2 commented 1 year ago

@ElvishJerricco I have an alternative patch that I think might work where we can instead just make sure that the initrd closure is always the real NixOS one so it will still be a real file, with systemd-boot this seems pretty simple, though I'm not sure about others.

I don't really mind what way it gets resolved, in fact I think doing both might be a good idea, since avoiding assumptions of it being a direct file seems like a good idea, and setting initrd to be a symlink-resolved initrd also seems good

notgne2 commented 1 year ago
diff --git a/nixos/modules/system/boot/loader/systemd-boot/systemd-boot-builder.py b/nixos/modules/system/boot/loader/systemd-boot/systemd-boot-builder.py
index ad7e2184d2a..2d71098c21b 100755
--- a/nixos/modules/system/boot/loader/systemd-boot/systemd-boot-builder.py
+++ b/nixos/modules/system/boot/loader/systemd-boot/systemd-boot-builder.py
@@ -85,18 +85,18 @@ def copy_from_profile(profile: Optional[str], generation: int, specialisation: O
     return efi_file_path

-def describe_generation(generation_dir: str) -> str:
+def describe_generation(profile: Optional[str], generation: int, specialisation: Optional[str]) -> str:
     try:
-        with open("%s/nixos-version" % generation_dir) as f:
+        with open(profile_path(profile, generation, specialisation, "nixos-version")) as f:
             nixos_version = f.read()
     except IOError:
         nixos_version = "Unknown"

-    kernel_dir = os.path.dirname(os.path.realpath("%s/kernel" % generation_dir))
+    kernel_dir = os.path.dirname(profile_path(profile, generation, specialisation, "kernel"))
     module_dir = glob.glob("%s/lib/modules/*" % kernel_dir)[0]
     kernel_version = os.path.basename(module_dir)

-    build_time = int(os.path.getctime(generation_dir))
+    build_time = int(os.path.getctime(system_dir(profile, generation, specialisation)))
     build_date = datetime.datetime.fromtimestamp(build_time).strftime('%F')

     description = "NixOS {}, Linux Kernel {}, Built on {}".format(
@@ -116,11 +116,10 @@ def write_entry(profile: Optional[str], generation: int, specialisation: Optiona
         pass
     entry_file = "@efiSysMountPoint@/loader/entries/%s" % (
         generation_conf_filename(profile, generation, specialisation))
-    generation_dir = os.readlink(system_dir(profile, generation, specialisation))
     tmp_path = "%s.tmp" % (entry_file)
-    kernel_params = "init=%s/init " % generation_dir
+    kernel_params = "init=%s " % profile_path(profile, generation, specialisation, "init")

-    with open("%s/kernel-params" % (generation_dir)) as params_file:
+    with open(profile_path(profile, generation, specialisation, "kernel-params")) as params_file:
         kernel_params = kernel_params + params_file.read()
     with open(tmp_path, 'w') as f:
         f.write(BOOT_ENTRY.format(profile=" [" + profile + "]" if profile else "",
@@ -129,7 +128,7 @@ def write_entry(profile: Optional[str], generation: int, specialisation: Optiona
                     kernel=kernel,
                     initrd=initrd,
                     kernel_params=kernel_params,
-                    description=describe_generation(generation_dir)))
+                    description=describe_generation(profile, generation, specialisation)))
         if machine_id is not None:
             f.write("machine-id %s\n" % machine_id)
     os.rename(tmp_path, entry_file)
@@ -284,7 +283,7 @@ def main() -> None:
             write_entry(*gen, machine_id)
             for specialisation in get_specialisations(*gen):
                 write_entry(*specialisation, machine_id)
-            if os.readlink(system_dir(*gen)) == args.default_config:
+            if os.path.dirname(profile_path(*gen, "init")) == args.default_config:
                 write_loader_conf(*gen)
         except OSError as e:
             profile = f"profile '{gen.profile}'" if gen.profile else "default profile"

untested partially tested (init= gets properly set to use init from the original NixOS closure and I see no reason to think it wouldn't work), but I think this is more or less correct, we use profile_path a bit already, this just expands it to replace all usages of getting the system closure