nix-community / impermanence

Modules to help you handle persistent state on systems with ephemeral root storage [maintainer=@talyz]
MIT License
1.16k stars 86 forks source link

nsncd service fails to start with current Impermanence #219

Closed tschan closed 2 weeks ago

tschan commented 2 weeks ago

I was trying to figure out why my system stopped booting because of a failed start of the nscd unit.

The details are in this nixpkgs issue that I created before I knew it was caused by an Impermanence upgrade. Just for completeness sake the log of the failed service again:

Sep 27 07:57:38 desktop-nixos systemd[1]: Starting Name Service Cache Daemon (nsncd)...
Sep 27 07:57:39 desktop-nixos nsncd[2794]: Sep 27 05:57:38.922 INFO started, config: Config { ignored_request_types: {}, worker_count: 8, handoff_timeout: 3s }, path: "/var/run/nscd/socket"
Sep 27 07:57:39 desktop-nixos nsncd[2794]: Error: Read-only file system (os error 30)
Sep 27 07:57:38 desktop-nixos systemd[1]: nscd.service: Main process exited, code=exited, status=1/FAILURE
Sep 27 07:57:38 desktop-nixos systemd[1]: nscd.service: Failed with result 'exit-code'.
Sep 27 07:57:38 desktop-nixos systemd[1]: Failed to start Name Service Cache Daemon (nsncd).
Sep 27 07:57:39 desktop-nixos systemd[1]: Stopped Name Service Cache Daemon (nsncd).
Sep 27 07:57:39 desktop-nixos systemd[1]: Starting Name Service Cache Daemon (nsncd)...
Sep 27 07:57:39 desktop-nixos nsncd[3194]: Sep 27 05:57:39.031 INFO started, config: Config { ignored_request_types: {}, worker_count: 8, handoff_timeout: 3s }, path: "/var/run/nscd/socket"
Sep 27 07:57:39 desktop-nixos nsncd[3194]: Error: Read-only file system (os error 30)
Sep 27 07:57:39 desktop-nixos systemd[1]: nscd.service: Main process exited, code=exited, status=1/FAILURE
Sep 27 07:57:39 desktop-nixos systemd[1]: nscd.service: Failed with result 'exit-code'.
Sep 27 07:57:39 desktop-nixos systemd[1]: Failed to start Name Service Cache Daemon (nsncd).
Sep 27 07:57:39 desktop-nixos systemd[1]: nscd.service: Scheduled restart job, restart counter is at 1.
Sep 27 07:57:39 desktop-nixos systemd[1]: Starting Name Service Cache Daemon (nsncd)...
Sep 27 07:57:39 desktop-nixos nsncd[3329]: Sep 27 05:57:39.209 INFO started, config: Config { ignored_request_types: {}, worker_count: 8, handoff_timeout: 3s }, path: "/var/run/nscd/socket"
Sep 27 07:57:39 desktop-nixos nsncd[3329]: Error: Read-only file system (os error 30)
Sep 27 07:57:39 desktop-nixos systemd[1]: nscd.service: Main process exited, code=exited, status=1/FAILURE
Sep 27 07:57:39 desktop-nixos systemd[1]: nscd.service: Failed with result 'exit-code'.
Sep 27 07:57:39 desktop-nixos systemd[1]: Failed to start Name Service Cache Daemon (nsncd).
Sep 27 07:57:39 desktop-nixos systemd[1]: nscd.service: Scheduled restart job, restart counter is at 2.
Sep 27 07:57:39 desktop-nixos systemd[1]: nscd.service: Start request repeated too quickly.
Sep 27 07:57:39 desktop-nixos systemd[1]: nscd.service: Failed with result 'exit-code'.
Sep 27 07:57:39 desktop-nixos systemd[1]: Failed to start Name Service Cache Daemon (nsncd).

Once I reverted this flake.lock update it started working again:

• Updated input 'impermanence':
    'github:nix-community/impermanence/63f4d0443e32b0dd7189001ee1894066765d18a5' (2024-09-07)
  → 'github:nix-community/impermanence/8514fff0f048557723021ffeb31ca55f69b67de3' (2024-09-24)

Any idea how that could cause this issue?

tschan commented 2 weeks ago

I'm using the following service to recreate root. Do I have to change something for it to work with the create-needed-for-boot-dirs unit:

boot.initrd.systemd.services.recreate-root = {
  description = "Rolling over and creating new filesystem root";

  requires = [ "initrd-root-device.target" ];
  after = [
    "local-fs-pre.target"
    "initrd-root-device.target"
  ];
  requiredBy = [ "initrd-root-fs.target" ];
  before = [ "sysroot.mount" ];

  unitConfig = {
    AssertPathExists = "/etc/initrd-release";
    DefaultDependencies = false;
  };

  serviceConfig = {
    Type = "oneshot";
    RemainAfterExit = true;
  };

  script = ''
    mkdir /btrfs_tmp
    mount /dev/mapper/cryptroot /btrfs_tmp
    if [[ -e /btrfs_tmp/root ]]; then
      mkdir -p /btrfs_tmp/old_roots
      timestamp=$(date --date="@$(stat -c %Y /btrfs_tmp/root)" "+%Y-%m-%d_%H:%M:%S")
      mv /btrfs_tmp/root "/btrfs_tmp/old_roots/$timestamp"
    fi

    delete_subvolume_recursively() {
      IFS=$'\n'
      for i in $(btrfs subvolume list -o "$1" | cut -f 9- -d ' '); do
        delete_subvolume_recursively "/btrfs_tmp/$i"
      done
      btrfs subvolume delete "$1"
    }

    for i in $(find /btrfs_tmp/old_roots/ -maxdepth 1 -mtime +30); do
      delete_subvolume_recursively "$i"
    done

    btrfs subvolume create /btrfs_tmp/root
    umount /btrfs_tmp
  '';
};

Edit: Disabling the service with

boot.initrd.systemd.services.create-needed-for-boot-dirs.wantedBy = lib.mkForce [];

allows the system to boot again. But I'd rather fix the root cause.

tschan commented 2 weeks ago

Apparently there was something wrong with my service definition. I changed it to the following and now it works;

boot.initrd.systemd.services.recreate-root = {
  description = "Rolling over and creating new filesystem root";

  wantedBy = [ "initrd.target" ];
  requires = [ "initrd-root-device.target" ];
  after = [ "initrd-root-device.target" ];
  before = [ "sysroot.mount" ];

  unitConfig.DefaultDependencies = "no";
  serviceConfig.Type = "oneshot";

  script = ''
    mkdir /btrfs_tmp
    mount /dev/mapper/cryptroot /btrfs_tmp
    if [[ -e /btrfs_tmp/root ]]; then
      mkdir -p /btrfs_tmp/old_roots
      timestamp=$(date --date="@$(stat -c %Y /btrfs_tmp/root)" "+%Y-%m-%d_%H:%M:%S")
      mv /btrfs_tmp/root "/btrfs_tmp/old_roots/$timestamp"
    fi

    delete_subvolume_recursively() {
      IFS=$'\n'
      for i in $(btrfs subvolume list -o "$1" | cut -f 9- -d ' '); do
        delete_subvolume_recursively "/btrfs_tmp/$i"
      done
      btrfs subvolume delete "$1"
    }

    for i in $(find /btrfs_tmp/old_roots/ -maxdepth 1 -mtime +30); do
      delete_subvolume_recursively "$i"
    done

    btrfs subvolume create /btrfs_tmp/root
    umount /btrfs_tmp
  '';
};

No idea what exactly the problem was but I just took the working example from here.