vpsfreecz / vpsadminos

Host for Linux system containers based on NixOS, ZFS and LXC
https://vpsadminos.org
MIT License
155 stars 26 forks source link

NixOS container not starting in NixOS host #22

Closed jficz closed 5 years ago

jficz commented 5 years ago

NixOS guest container created on a NixOS Host running in vpsadminos is not starting.

Container snippet

{
    testcontainer = { ... }:
        {
            system.stateVersion = "18.09";
            deployment.container.host = "<NixOS host>";
            deployment.targetEnv = "container";
        };
}

Host snippet

{
...
boot.enableContainers = true;
...
}

Result:

$ nixops deploy -d testcontainer     
testcontainer> building initial configuration...
trace: warning: You don't have `system.stateVersion` explicitly set. Expect things to break.
these derivations will be built:
  /nix/store/bknd6yfbhpcscvy2y5679k1rw5cviq7v-root-authorized_keys.drv
  /nix/store/qa2chcj2l717wcb62582fa2razwcmypx-etc.drv
  /nix/store/4pxp7xwm0rl9898vqlvr21yhsgcmb827-nixos-system-testcontainer-18.09pre147772.d1ae60cbad7.drv
these paths will be fetched (0.00 MiB download, 0.00 MiB unpacked):
  /nix/store/iqrc6b628a6yxp0x4l45d3yj3mzjlrlq-stage-2-init.sh
copying path '/nix/store/iqrc6b628a6yxp0x4l45d3yj3mzjlrlq-stage-2-init.sh' from 'https://cache.nixos.org'...
building '/nix/store/bknd6yfbhpcscvy2y5679k1rw5cviq7v-root-authorized_keys.drv'...
building '/nix/store/qa2chcj2l717wcb62582fa2razwcmypx-etc.drv'...
building '/nix/store/4pxp7xwm0rl9898vqlvr21yhsgcmb827-nixos-system-testcontainer-18.09pre147772.d1ae60cbad7.drv'...
testcontainer> creating container...
testcontainer> copying 3 paths...
testcontainer> copying path '/nix/store/l7kaidi51vzg0kinxlm7i0vgf8d8d149-root-authorized_keys' to 'ssh://root@<nixoshost>'...
testcontainer> copying path '/nix/store/40yyk8sxfx31ia21b4421nshqafqnggb-etc' to 'ssh://root@<nixoshost>'...
testcontainer> copying path '/nix/store/j02k58p6fnwpjcrjfpradl283zsyc8xp-nixos-system-testcontainer-18.09pre147772.d1ae60cbad7' to 'ssh://root@<nixoshost>'...
testcontainer> host IP is 10.233.1.1, container IP is 10.233.1.2
testcontainer> Job for container@testcon.service failed because the control process exited with error code.
testcontainer> See "systemctl status container@testcon.service" and "journalctl -xe" for details.
testcontainer> /run/current-system/sw/bin/nixos-container: failed to start container
error: command ‘['ssh', '-oControlPath=/tmp/nixops-ssh-tmpzJ8PYU/master-socket', '-x', 'root@<nixoshost>, '--', 'nixos-container start testcon']’ failed on machine ‘testcontainer’ (exit code 1)

Journal:

-- A new session with the ID 292637 has been created for the user root.
-- 
-- The leading process of the session is 3916.
Nov 07 23:19:25 nixhost systemd[1]: Started Session 292637 of user root.
-- Subject: Unit session-292637.scope has finished start-up
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit session-292637.scope has finished starting up.
-- 
-- The start-up result is RESULT.
Nov 07 23:19:28 nixhost systemd[1]: system.slice: Failed to reset devices.list: Operation not permitted
Nov 07 23:19:28 nixhost systemd[1]: system-container.slice: Failed to reset devices.list: Operation not permitted
Nov 07 23:19:28 nixhost systemd[1]: container@testcon.service: Failed to reset devices.list: Operation not permitted
Nov 07 23:19:28 nixhost systemd[1]: Failed to set devices.allow on /system.slice/system-container.slice/container@testcon.service: Operation not permitted
Nov 07 23:19:28 nixhost systemd[1]: Failed to set devices.allow on /system.slice/system-container.slice/container@testcon.service: Operation not permitted
Nov 07 23:19:28 nixhost systemd[1]: Failed to set devices.allow on /system.slice/system-container.slice/container@testcon.service: Operation not permitted
Nov 07 23:19:28 nixhost systemd[1]: Failed to set devices.allow on /system.slice/system-container.slice/container@testcon.service: Operation not permitted
Nov 07 23:19:28 nixhost systemd[1]: Failed to set devices.allow on /system.slice/system-container.slice/container@testcon.service: Operation not permitted
Nov 07 23:19:28 nixhost systemd[1]: Failed to set devices.allow on /system.slice/system-container.slice/container@testcon.service: Operation not permitted
Nov 07 23:19:28 nixhost systemd[1]: Failed to set devices.allow on /system.slice/system-container.slice/container@testcon.service: Operation not permitted
Nov 07 23:19:28 nixhost systemd[1]: Failed to set devices.allow on /system.slice/system-container.slice/container@testcon.service: Operation not permitted
Nov 07 23:19:28 nixhost systemd[1]: Failed to set devices.allow on /system.slice/system-container.slice/container@testcon.service: Operation not permitted
Nov 07 23:19:28 nixhost systemd[1]: Failed to set devices.allow on /system.slice/system-container.slice/container@testcon.service: Operation not permitted
Nov 07 23:19:28 nixhost systemd[1]: Starting Container 'testcon'...
-- Subject: Unit container@testcon.service has begun start-up
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit container@testcon.service has begun starting up.
Nov 07 23:19:28 nixhost container testcon[4000]: Spawning container testcon on /var/lib/containers/testcon.
Nov 07 23:19:28 nixhost container testcon[4000]: Press ^] three times within 1s to kill container.
Nov 07 23:19:28 nixhost container testcon[4000]: /etc/localtime does not point into /usr/share/zoneinfo/, not updating container timezone.
Nov 07 23:19:28 nixhost container testcon[4000]: Failed to mount sysfs on /sys/full (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC ""): No such file or directory
Nov 07 23:19:28 nixhost container testcon[4000]: Failed to add new veth interfaces (ve-testcon:host0): No such process
Nov 07 23:19:28 nixhost systemd[1]: container@testcon.service: Main process exited, code=exited, status=1/FAILURE
Nov 07 23:19:28 nixhost systemd[1]: container@testcon.service: Failed with result 'exit-code'.
Nov 07 23:19:28 nixhost systemd[1]: Failed to start Container 'testcon'.
aither64 commented 5 years ago

I had no idea what's going on when you first reported it, but now I've confirmed it to be an AppArmor issue. It doesn't report anything to kernel log which had me confused. We'll get it sorted next week or so.

aither64 commented 5 years ago

Should be fixed by 23ac899220ef8ed9906b4eb238c63f069a6e0467, which was deployed to staging yesterday. systemd-nspawn --private-network and thus nixos containers should now work. Can you confirm?

jficz commented 5 years ago

Confirming the container is up. Got a couple of networking problems but that could just be bad config. This issue can be closed as far as I'm concerned. Thanks!

jficz commented 5 years ago

ok, maybe not:

-- Unit container@test.service has begun starting up.
Sep 05 17:44:47 nhost container test[8671]: Spawning container test on /var/lib/containers/test.
Sep 05 17:44:47 nhost container test[8671]: Press ^] three times within 1s to kill container.
Sep 05 17:44:48 nhost container test[8671]: /etc/localtime does not point into /usr/share/zoneinfo/, not updating container timezone.
-- Subject: A virtual machine or container has been started
Sep 05 17:44:49 nhost container test[8671]: <<< NixOS Stage 2 >>>
Sep 05 17:44:49 nhost container test[8671]: tee: /proc/self/fd/10: No such device or address
Sep 05 17:44:49 nhost container test[8671]: starting systemd...
Sep 05 17:44:49 nhost container test[8671]: systemd 239 running in system mode. (+PAM +AUDIT -SELINUX +IMA +APPARMOR +SMACK -SYSVINIT +UTMP -LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
Sep 05 17:44:49 nhost container test[8671]: Detected virtualization systemd-nspawn.
Sep 05 17:44:49 nhost container test[8671]: Detected architecture x86-64.
Sep 05 17:44:49 nhost container test[8671]: [1B blob data]
Sep 05 17:44:49 nhost container test[8671]: Welcome to NixOS 19.03.173408.bd6ba87381e (Koi)!
Sep 05 17:44:49 nhost container test[8671]: [1B blob data]
Sep 05 17:44:49 nhost container test[8671]: Set hostname to <test>.
Sep 05 17:44:49 nhost container test[8671]: Initializing machine ID from container UUID.
Sep 05 17:44:49 nhost container test[8671]: Failed to read AF_UNIX datagram queue length, ignoring: No such file or directory
Sep 05 17:44:49 nhost container test[8671]: Failed to install release agent, ignoring: No such file or directory
Sep 05 17:44:49 nhost container test[8671]: File /nix/store/679k7dlwk5iifgdynxmi3r48ii7fgifd-systemd-239.20190219/example/systemd/system/systemd-journald.service:36 configures an IP firewall (IPAddressDeny=any), but the local system does not support BPF/cgroup based firewalling.
Sep 05 17:44:49 nhost container test[8671]: Proceeding WITHOUT firewalling in effect! (This warning is only shown for the first loaded unit using IP firewalling.)
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Created slice User and Session Slice.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Listening on Journal Socket.
Sep 05 17:44:50 nhost container test[8671]:          Mounting POSIX Message Queue File System...
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target Remote File Systems.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Created slice system-getty.slice.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Created slice PHP FastCGI Process manager pools slice.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target Swap.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started Dispatch Password Requests to Console Directory Watch.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Listening on initctl Compatibility Named Pipe.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started Forward Password Requests to Wall Directory Watch.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target Paths.
Sep 05 17:44:50 nhost container test[8671]:          Starting Update UTMP about System Boot/Shutdown...
Sep 05 17:44:50 nhost container test[8671]:          Starting Firewall...
Sep 05 17:44:50 nhost container test[8671]:          Starting Apply Kernel Variables...
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target Slices.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target Local File Systems (Pre).
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target Local File Systems.
Sep 05 17:44:50 nhost container test[8671]:          Starting Rebuild Journal Catalog...
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Listening on Journal Socket (/dev/log).
Sep 05 17:44:50 nhost container test[8671]:          Starting Journal Service...
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started Update UTMP about System Boot/Shutdown.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Mounted POSIX Message Queue File System.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started Apply Kernel Variables.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started Rebuild Journal Catalog.
Sep 05 17:44:50 nhost container test[8671]:          Starting Update is Completed...
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started Update is Completed.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started Firewall.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target Network (Pre).
Sep 05 17:44:50 nhost container test[8671]:          Starting Networking Setup...
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target All Network Interfaces (deprecated).
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started Journal Service.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target System Initialization.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started Daily Cleanup of Temporary Directories.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target Timers.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Listening on D-Bus System Message Bus Socket.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target Sockets.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target Basic System.
Sep 05 17:44:50 nhost container test[8671]:          Starting Name Service Cache Daemon...
Sep 05 17:44:50 nhost container test[8671]:          Starting Flush Journal to Persistent Storage...
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started Flush Journal to Persistent Storage.
Sep 05 17:44:50 nhost container test[8671]:          Starting Create Volatile Files and Directories...
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started Create Volatile Files and Directories.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started D-Bus System Message Bus.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started Name Service Cache Daemon.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target Host and Network Name Lookups.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target User and Group Name Lookups.
Sep 05 17:44:50 nhost container test[8671]:          Starting Login Service...
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started Networking Setup.
Sep 05 17:44:50 nhost container test[8671]:          Starting Extra networking commands....
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started Extra networking commands..
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target Network.
Sep 05 17:44:50 nhost container test[8671]:          Starting PHP FastCGI Process Manager service for pool mail...
Sep 05 17:44:50 nhost container test[8671]:          Starting Permit User Sessions...
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target Network is Online.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started Permit User Sessions.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started Console Getty.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target Login Prompts.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started Login Service.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Started PHP FastCGI Process Manager service for pool mail.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target PHP FastCGI Process manager pools target.
Sep 05 17:44:50 nhost container test[8671]: [  OK  ] Reached target Multi-User System.
-- Subject: Unit container@test.service has finished start-up
-- Unit container@test.service has finished starting up.
Sep 05 17:44:51 nhost container test[8671]: [2B blob data]
Sep 05 17:44:51 nhost container test[8671]: [1B blob data]
Sep 05 17:44:51 nhost container test[8671]: <<< Welcome to NixOS 19.03.173408.bd6ba87381e (x86_64) - console >>>
Sep 05 17:44:51 nhost container test[8671]: [1B blob data]
Sep 05 17:44:51 nhost container test[8671]: Run `nixos-help` for the NixOS manual.
Sep 05 17:44:51 nhost container test[8671]: [1B blob data]
Sep 05 17:48:37 nhost systemd[1]: container@test.service: Failed to reset devices.list: Operation not permitted
Sep 05 17:48:37 nhost systemd[1]: Failed to set devices.allow on /machine.slice/container@test.service: Operation not permitted
Sep 05 17:48:37 nhost systemd[1]: Failed to set devices.allow on /machine.slice/container@test.service: Operation not permitted
Sep 05 17:48:37 nhost systemd[1]: Failed to set devices.allow on /machine.slice/container@test.service: Operation not permitted
Sep 05 17:48:37 nhost systemd[1]: Failed to set devices.allow on /machine.slice/container@test.service: Operation not permitted
Sep 05 17:48:37 nhost systemd[1]: Failed to set devices.allow on /machine.slice/container@test.service: Operation not permitted
Sep 05 17:48:37 nhost systemd[1]: Failed to set devices.allow on /machine.slice/container@test.service: Operation not permitted
Sep 05 17:48:37 nhost systemd[1]: Failed to set devices.allow on /machine.slice/container@test.service: Operation not permitted
Sep 05 17:48:37 nhost systemd[1]: Failed to set devices.allow on /machine.slice/container@test.service: Operation not permitted
Sep 05 17:48:37 nhost systemd[1]: Failed to set devices.allow on /machine.slice/container@test.service: Operation not permitted
Sep 05 17:48:37 nhost systemd[1]: Failed to set devices.allow on /machine.slice/container@test.service: Operation not permitted

There seems to be some kind of a problem with networking. I'll dig deeper.

aither64 commented 5 years ago

I don't see any showstopper in the log you sent. The device related errors should be harmless.

Here's what I tried:

boot.enableContainers = true;
  containers.webik = {
    privateNetwork = true;
    hostAddress = "192.168.100.10";
    localAddress = "192.168.100.11";
    config = 
      { config, pkgs, ... }:
      {
        networking.firewall.allowedTCPPorts = [ 80 ];
        services.nginx.enable = true;
      };
  };
[root@nixos:~]# curl http://192.168.100.11
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

Imperative containers with network as in https://nixos.org/nixos/manual/index.html#sec-container-networking work for me too, even the NAT setup described below.

My guess is that you hit firewall in the container and you need to open ports, that was my only issue. If you still have problems, send us your configuration and show what do you expect to be working and doesn't.

jficz commented 5 years ago

I got carried away by the

Sep 05 17:48:37 nhost systemd[1]: Failed to set devices.allow on /machine.slice/container@test.service: Operation not permitted

You're right, there's no problem, I just forgot that declarative containers don't get their network configured automatically.

aither64 commented 5 years ago

Great to hear that, closing.