serokell / deploy-rs

A simple multi-profile Nix-flake deploy tool.
Other
1.22k stars 100 forks source link

Rebooting reverts machine to old generation #254

Closed pizzapim closed 5 months ago

pizzapim commented 5 months ago

I am using deploy-rs to configure my server, but after a reboot the server reverts to an older configuration. I am not entirely sure how to debug this problem, so any pointers are appreciated. I initially installed NixOS using nixos-anywhere, so this might be part of the problem.

I did notice that /run/current-system is reverted back to /run/booted-system (which is the same as /nix/var/nix/profiles/system) after a reboot.

Here you can see the effect of deploying and rebooting the server.

The current state, after running for a while and applying various configurations:

[root@jefke:~]# ls -alh /nix/var/nix/profiles/
total 12K
lrwxrwxrwx 1 root root 43 14 jan 19:40 default -> /nix/var/nix/profiles/per-user/root/profile
drwxr-xr-x 1 root root  8  6 jan 22:23 per-user
lrwxrwxrwx 1 root root 13  6 jan 22:23 system -> system-1-link
lrwxrwxrwx 1 root root 85  6 jan 22:23 system-1-link -> /nix/store/i7l51f77b4bsdmbx7k4m9274lkn87bps-nixos-system-jefke-23.11.20231215.40c3c94

[root@jefke:~]# ls -alhd /run/booted-system
lrwxrwxrwx 1 root root 85  7 jan 00:49 /run/booted-system -> /nix/store/i7l51f77b4bsdmbx7k4m9274lkn87bps-nixos-system-jefke-23.11.20231215.40c3c94

[root@jefke:~]# ls -alhd /run/current-system
lrwxrwxrwx 1 root root 85 14 jan 16:43 /run/current-system -> /nix/store/pdn5n3xf2vlkr7h09d9w6bvbgnp7ry4y-nixos-system-jefke-23.11.20231215.40c3c94

Note that the /nix/var/nix/profiles/default symlink above points to a non-existent file.

The state after rebooting:

[root@jefke:~]# ls -alh /nix/var/nix/profiles/
total 12K
lrwxrwxrwx 1 root root 43 14 jan 19:40 default -> /nix/var/nix/profiles/per-user/root/profile
drwxr-xr-x 1 root root  8  6 jan 22:23 per-user
lrwxrwxrwx 1 root root 13  6 jan 22:23 system -> system-1-link
lrwxrwxrwx 1 root root 85  6 jan 22:23 system-1-link -> /nix/store/i7l51f77b4bsdmbx7k4m9274lkn87bps-nixos-system-jefke-23.11.20231215.40c3c94

[root@jefke:~]# ls -alhd /run/booted-system
lrwxrwxrwx 1 root root 85 14 jan 19:46 /run/booted-system -> /nix/store/i7l51f77b4bsdmbx7k4m9274lkn87bps-nixos-system-jefke-23.11.20231215.40c3c94

[root@jefke:~]# ls -alhd /run/current-system
lrwxrwxrwx 1 root root 85 14 jan 19:46 /run/current-system -> /nix/store/i7l51f77b4bsdmbx7k4m9274lkn87bps-nixos-system-jefke-23.11.20231215.40c3c94

Above we see that the system reverted back to /run/booted-system.

The state after re-applying my configuration using deploy-rs again:

[root@jefke:~]# ls -alh /nix/var/nix/profiles/
total 12K
drwxr-xr-x 1 root root 68 14 jan 19:52 .
drwxr-xr-x 1 root root 92 14 jan 19:52 ..
lrwxrwxrwx 1 root root 43 14 jan 19:40 default -> /nix/var/nix/profiles/per-user/root/profile
drwxr-xr-x 1 root root  8  6 jan 22:23 per-user
lrwxrwxrwx 1 root root 13  6 jan 22:23 system -> system-1-link
lrwxrwxrwx 1 root root 85  6 jan 22:23 system-1-link -> /nix/store/i7l51f77b4bsdmbx7k4m9274lkn87bps-nixos-system-jefke-23.11.20231215.40c3c94

[root@jefke:~]# ls -alhd /run/booted-system
lrwxrwxrwx 1 root root 85 14 jan 19:46 /run/booted-system -> /nix/store/i7l51f77b4bsdmbx7k4m9274lkn87bps-nixos-system-jefke-23.11.20231215.40c3c94

[root@jefke:~]# ls -alhd /run/current-system
lrwxrwxrwx 1 root root 85 14 jan 19:52 /run/current-system -> /nix/store/pdn5n3xf2vlkr7h09d9w6bvbgnp7ry4y-nixos-system-jefke-23.11.20231215.40c3c94
notgne2 commented 5 months ago

I'm not sure what's going on here, but the first things that stand out to me are that you only have 1 system profile after deploying (iirc one gets added, not entirely replaced), also that the store path of the system profile is /nix/store/[...]-nixos-system-[...], it should definitely be /nix/store/[...]/-activatable-nixos-system-[...] instead.

Can you post a snippet of your flake.nix and/or make sure that profiles.system.path is set to something like pkgs.deploy-rs.lib.activate.nixos self.nixosConfigurations.mysystem; (and not just the nixosConfigurations part, without the activate wrapper)?

pizzapim commented 5 months ago

Can confirm I am using lib.activate wrapper. Here is the snippet (mkDeployNodes is a wrapper to handle my custom machine configuration but should be clear):

deploy = {
  sshUser = "root";
  user = "root";

  nodes = mkDeployNodes (machine: {
    hostname = machine.hostName;
    profiles.hypervisor = {
      path = deploy-rs.lib.${system}.activate.nixos
        self.nixosConfigurations.${machine.name};
    };
  });
};

I also find it puzzling why the system profile is replaced. I don't think I have the knowledge to debug profiles unfortunately, but I will try to install NixOS on a system manually without nixos-anywhere. Maybe that makes a difference.

notgne2 commented 5 months ago

I think it's probably because of the profile name hypervisor, which should be system. The NixOS activation script is probably updating the profile generation itself since deploy-rs didn't, and that's why it ended up with the NixOS derivation instead of the wrapped one, and why the generations aren't working correctly.

pizzapim commented 5 months ago

Thank you, that was indeed the problem! Perhaps this is lack of knowledge of NixOS internals from me, but it was not entirely clear from the docs. Anyways, I'm a happy camper!