nix-community / nixos-generators

Collection of image builders [maintainer=@Lassulus]
MIT License
1.88k stars 146 forks source link

`nixos-rebuild` on `proxmox-lxc` container fails with `busctl` error, and causes other configuration change from base image #319

Open smacz42 opened 8 months ago

smacz42 commented 8 months ago

Summary

Running nixos-rebuild on a customized (or vanilla) proxmox-lxc image is unsuccessful, renders the container into an unmanageable state, and some configuration seems to be removed.

Steps to reproduce:

  1. Build the container a. nix run --extra-experimental-features nix-command --extra-experimental-features flakes github:nix-community/nixos-generators --format proxmox-lxc -c /tmp/firstboot.nix b. cat << EOF > /tmp/firstboot.nix
    
    { config, pkgs, ... }:

{

Set up a systemd service

systemd.services.startup = { description = "Sets up the NixOS container on startup"; wantedBy = [ "multi-user.target" ]; script = "echo 'Hello World'" } } EOF

2. Run the container in Proxmox
3. Create a minimal `configuration.nix`:

{config, pkgs, ... }:

{ imports = [ <nixpkgs/nixos/modules/virtualisation/lxc-container.nix> ];

environment.variables = { HISTFILESIZE = ""; HISTSIZE = ""; HISTTIMEFORMAT = "%F %T "; NIX_SSL_CERT_FILE = "/etc/ssl/certs/ca-certificates.crt"; };

systemd.mounts = [{ where = "/sys/kernel/debug"; enable = false; }];

environment.systemPackages = with pkgs; [ vim binutils ]; }

4. Run `nixos-rebuild test`

'/nix/store/zgzrbba39fsn341s5dyl89wi7cdavsf0-system-path/bin/busctl --json=short call org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager ListUnitsByPatterns asas 0 0' exited with value 1 at /nix/store/nhys1a2wsn5x5xm5bv5msk6ynqrhya4q-nixos-system-nixos-23.11.5408.8ac30a39abc5/bin/switch-to-configuration line 145.

  a. This actually borks the system the exact same as a `switch` would, even though it's only a `test`.
5. Re-run `nixos-rebuild switch`

building Nix... building the system configuration... trace: warning: system.stateVersion is not set, defaulting to 23.11. Read why this matters on https://nixos.org/manual/nixos/stable/options.html#opt-system.stateVersion. stopping the following units: network-local-commands.service, systemd-networkd-wait-online.service, systemd-networkd.service, systemd-networkd.socket, systemd-resolved.service activating the configuration... setting up /etc... removing obsolete symlink ‘/etc/resolv.conf’... removing obsolete symlink ‘/etc/man_db.conf’... removing obsolete symlink ‘/etc/systemd/networkd.conf’... removing obsolete symlink ‘/etc/systemd/resolved.conf’... restarting systemd... Failed to list users: Failed to activate service 'org.freedesktop.login1': timed out (service_start_timeout=25000ms) Unable to close the file handle to loginctl at /nix/store/nhys1a2wsn5x5xm5bv5msk6ynqrhya4q-nixos-system-nixos-23.11.5408.8ac30a39abc5/bin/switch-to-configuration line 890. warning: error(s) occurred while switching to the new configuration


## Issues

### Pre-reboot:
1. `busctl` issue above as output of `nixos-rebuild`
2. Cannot shutdown as root in container:

Failed to set wall message, ignoring: Access denied Call to Reboot failed: Access denied

3. dbus errors in logs:

dbus-daemon[280]: [system] Rejected send message, 1 matched rules; type="method_call", sender=":1.5" (uid=0 pid=16558 comm="systemd-run -E LOCALE_ARCHIVE -E NIXOS_INSTALL_BOO" label="unconfined") interface="org.freedesktop.systemd1.Manager" member="StartTransientUnit" error name="(unset)" requested_reply="0" destination="org.freedesktop.systemd1" (uid=0 pid=1 comm="/run/current-system/systemd/lib/systemd/systemd" label="unconfined") dbus-daemon[280]: [system] Rejected send message, 1 matched rules; type="method_call", sender=":1.6" (uid=0 pid=16563 comm="/nix/store/zgzrbba39fsn341s5dyl89wi7cdavsf0-system" label="unconfined") interface="org.freedesktop.systemd1.Manager" member="ListUnitsByPatterns" error name="(unset)" requested_reply="0" destination="org.freedesktop.systemd1" (uid=0 pid=1 comm="/run/current-system/systemd/lib/systemd/systemd" label="unconfined") dbus-daemon[280]: [system] Rejected send message, 1 matched rules; type="method_call", sender=":1.7" (uid=0 pid=16571 comm="shutdown -r now" label="unconfined") interface="org.freedesktop.login1.Manager" member="SetWallMessage" error name="(unset)" requested_reply="0" destination="org.freedesktop.login1" (uid=0 pid=291 comm="/nix/store/dzp7d4k1d94s1x49p9171mvcsfyxr7bj-system" label="unconfined") dbus-daemon[280]: [system] Rejected send message, 1 matched rules; type="method_call", sender=":1.7" (uid=0 pid=16571 comm="shutdown -r now" label="unconfined") interface="org.freedesktop.login1.Manager" member="RebootWithFlags" error name="(unset)" requested_reply="0" destination="org.freedesktop.login1" (uid=0 pid=291 comm="/nix/store/dzp7d4k1d94s1x49p9171mvcsfyxr7bj-system" label="unconfined")


### Post-reboot:
1. Hostname is reset to `nixos`
  a. This does not happen if I use the vanilla image from hydra.
2. Custom systemd services disappear
  a. This is probably because it's not defined in the `configuration.nix` file that got rebuilt against, but I wasn't expecting this behavior.
3. Odd changes when running `nixos-rebuild switch` again:

setting up /etc... removing obsolete symlink ‘/etc/resolv.conf’... removing obsolete symlink ‘/etc/man_db.conf’... removing obsolete symlink ‘/etc/systemd/resolved.conf’... removing obsolete symlink ‘/etc/systemd/networkd.conf’... setting up tmpfiles Cannot set file attributes for '/var/empty', value=0x00000010, mask=0x00000010, ignoring: Operation not permitted reloading the following units: dbus.service restarting the following units: nix-daemon.service starting the following units: network-local-commands.service the following new units were started: dhcpcd.service, network-setup.service, resolvconf.service



## Suspicions

I suspect this is because:
1. The `/etc/nixos/configuration.nix` is overriding whatever the build was built with, which disables everything that the container was built with (including hostname.)
2. I also suspect that the installation of `glibc` and whatever else gets installed causes an error for the (unprivileged) container restarting systemd services.
  a. I'm not sure, but this might have something to do with:  https://github.com/nix-community/nixos-generators/issues/86 

I would expect (without having my understanding of the internals of nixos) to be able to take the base image of the container, and to create a `/etc/nixos/configuration.nix` and run `nixos-rebuild switch` that does not break and/or modify the configuration of the base image, or have a way in which to include the configuration of the base image in such a way as to preserve the existing configuration.

I'm happy to do any further testing required in regards to these issues :)
smacz42 commented 8 months ago

Just to follow-up... are we not supposed to be able to run nixos-rebuild on these images? I guess I don't understand. Basically, is my "I would expect..." line above inaccurate?

mayl commented 8 months ago

I don't use proxmox or lxc so I can't comment on any of that specifically.

Broadly, nixos configurations as part of being declarative and reproducible, need to be complete. The result you get running nixos-rebuild is a function only of the config you pass, not the state of the current system. I describe this to say the expectation of a merge with current state you seem to describe is not a good mental model for what happens.

I suspect if you copy your config into the container and rebuild from that it could work, but I can't speak too much to that either. The way I use nixos-generators is to specify the full config up front, and rebuild a new container if I need changes. Except I mostly build vms, not containers :⁠-⁠)

nixos-discourse commented 5 months ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-proxmox-lxc-not-rebuilding-using-wiki-provided-configuration/47104/8

hogcycle commented 5 months ago

Glad to see I'm not the only one having issues. So is this a feature, not a bug? Should I be doing all of my configuration and packing it into the tarball? It seems unnecessary to have to do that for even the smallest of changes.