nix-community / impermanence

Modules to help you handle persistent state on systems with ephemeral root storage [maintainer=@talyz]
MIT License
1.03k stars 77 forks source link

Create presets #108

Open KFearsoff opened 1 year ago

KFearsoff commented 1 year ago

This PR addresses #10. Most of the design considerations are written down in that issue.

Roadmap (not necessarily in order):

talyz commented 1 year ago

The first bullet point is not possible to implement in a safe fashion - the file or directory can be read or written to while it's being copied or moved. We should absolutely not do that. I'm not really sure why it would be needed anyway, since /etc/machine-id works fine on all my systems. Is this related to systemd in initramfs?

KFearsoff commented 1 year ago

I'm not really sure why it would be needed anyway, since /etc/machine-id works fine on all my systems

Can you share how you achieved that? I can't reproduce it on NixOS Unstable, neither on my systems nor on VMs. /etc/machine-id gets initialized before bind-mounts from Impermanence are done. Then, the impermanence bind-mount service errors out, saying that a file already exists at /etc/machine-id.

Perhaps you can point to some things that I should also test while I'm at it? I feel like this behaviour comes up pretty consistently. It is also the case with host SSH keys: they are created before Impermanence is being ran.

talyz commented 1 year ago

Well, my setup is here: https://github.com/talyz/nixos-config/blob/master/modules/ephemeral-root.nix#L103 - nothing fancy at all. I get the following messages on boot:

impermanence-mount-file[1163]: mount already exists at /etc/machine-id, ignoring
impermanence-mount-file[1164]: mount already exists at /etc/ssh/ssh_host_ed25519_key, ignoring
impermanence-mount-file[1166]: mount already exists at /etc/ssh/ssh_host_ed25519_key.pub, ignoring
impermanence-mount-file[1167]: mount already exists at /etc/ssh/ssh_host_rsa_key, ignoring
impermanence-mount-file[1168]: mount already exists at /etc/ssh/ssh_host_rsa_key.pub, ignoring

but they're expected, since they've already been mounted by the activation script.

KFearsoff commented 1 year ago

Huh. I'm not sure what to think of it.

I'll research the problem in more detail and come back with a more detailed report on what really happens and what can be done about it. I am pretty positive that I'm not getting the same behaviour, though, running NixOS Unstable and latest Impermanence.

Perhaps the issue lies in my setup. I'm not actually deleting anything on boot yet. Perhaps machine-id and SSH keys are getting generated in stage 1, then the root is erased, and only then the bind mounts happen. That would be perfectly logical, but it would also mean that setups that don't yet erase anything are not supported. If so, then this caveat should be documented.

talyz commented 1 year ago

If you're not using ephemeral storage, that's the issue, yes. To bind mount files, empty files are created to serve as bind mount endpoints; if they're still there on boot, they would be identified as already existing files and not overwritten / mounted over. I suppose we could check for zero length files and consider them safe to mount over, but it's not really a supported use-case. I think the readme is already pretty clear in this regard - it's listed as the first point in the premises at the very top ;)

As a side-note, if you're afraid to accidentally delete anything, you can use something like https://github.com/talyz/nixos-config/blob/master/machines/trace/configuration.nix#L77-L86 to create a new root subvolume on every boot. This assumes you're using btrfs, but similar things should be possible with zfs.

KFearsoff commented 1 year ago

Thanks, your insight was very helpful. I reverted the "fix" I've done. The setup indeed works with the config snippet you provided for BTRFS. I'm not sure how to make it actually erase the system though, lol.

I might make a separate PR that would offload the whole "erasing" functionality to impermanence a bit later, feel like it's a good idea. In the meantime, I think we can start the process of naming things that I forgot to add to presets. There'll be lots of them.

talyz commented 1 year ago

To erase old subvolumes, I just do the following:

sudo mkdir /mnt
sudo mount /dev/root_vg/root /mnt
sudo btrfs subvolume delete "/mnt/$(sudo btrfs subvolume list -o /mnt/old_root_2022-07-6_17:51:38/ | cut -f 9 -d ' ')"
sudo btrfs subvolume delete /mnt/old_root_2022-07-6_17:51:38

I don't think erasing the root should be the task of Impermanence - there are many ways to do it and it's highly setup specific. We could however list a few examples of how to do it in the readme. I'll look into doing so.

When it comes to presets, I want to keep the structure as simple as possible:

KFearsoff commented 1 year ago

When it comes to presets, I want to keep the structure as simple as possible

Any particular reason for this? I feel like there's clear value in separating out the different types of state we can preserve, because it allows better judgement on what to preserve, how, if it should be backed up, etc.

rehno-lindeque commented 1 year ago

I'm hesitant to mention this because it's mostly aspirational (mostly something that I wished existed but really don't have time to work on myself).

But anyway, I felt it might be worth pointing out my own persistence "presets" for NixOS for reference: https://github.com/rehno-lindeque/nixos-impermanence

Here's the list of "supported" services: https://github.com/rehno-lindeque/nixos-impermanence/blob/28127d77d46da2777e26cf6b0d26d36fb2927823/flake.nix#L32-L41

Here's a poorly documented list of persistence levels: https://github.com/rehno-lindeque/nixos-impermanence/blob/28127d77d46da2777e26cf6b0d26d36fb2927823/nixos-modules/environment/persistence.nix#L42-L54

Aspirationally one would be able to persist "secret" level files to a different mount than other regular files, but I'm not sure this idea would prove out (or be useful).

(There's a companion repo for Home Manager "presets" too, which is unfortunately bare)

KFearsoff commented 10 months ago

Wanted to give an update cos I saw the NixCon talk on Impermanence and it was mentioned that PR on presets is WIP (when it is, sadly, not in progress lmao)

So there are a few things to keep in mind here:

  1. A lot of the NixOS modules follow the same few patterns, so we can try to set up some mechanism to try and hook up directly to the upstream
  2. The code quality in this PR is less than ideal, and it would be nice if I found the energy to make it better
  3. I'm not very comfortable merging this PR as is, but all things considered, it might be a good idea to focus on some of the presets first and get the ball rolling
  4. The whole thing with presets is... risky. Imagine if we encounter some regression that makes users lose data in, say, Grafana - that would be horrible. I doubt we have even human resources to extensively test for that, too. So I think a failover must be in place first. NixCon talk highlight the "use BTRFS and keep last months' roots" idea: we should probably create presets for actually setting up the Impermanence itself with BTRFS/ZFS/etc. first, in order to be sure that we are providing ways to restore the data