pdtpartners / nix-snapshotter

Brings native understanding of Nix packages to containerd
MIT License
532 stars 15 forks source link

nerdctl fails to run a minimal container ... 2 #129

Closed msackman closed 6 months ago

msackman commented 6 months ago

I've checked that the PR in https://github.com/pdtpartners/nix-snapshotter/pull/128 does not fix this, so I believe this not the same as https://github.com/pdtpartners/nix-snapshotter/issues/104 but I've failed to come up with a better name, sorry.

home-manager, rootless, and with gvisor not enabled (though gvisor doesn't make any difference).

./redis.nix:

let
  nix-snapshotter = import (
    builtins.fetchTarball "https://github.com/pdtpartners/nix-snapshotter/archive/main.tar.gz"
  );

  pkgs = import <nixpkgs> { overlays = [ nix-snapshotter.overlays.default ]; };

  image = pkgs.nix-snapshotter.buildImage {
    name = "my-redis";
    resolvedByNix = true;
    config = {
      entrypoint = [ "${pkgs.redis}/bin/redis-server" ];
    };
  };
in
image
# nix-build ./redis.nix
this derivation will be built:
  /nix/store/ax3b630sbrhixhsc21bm76p6sgalhvj2-nix-image-my-redis.tar.drv
building '/nix/store/ax3b630sbrhixhsc21bm76p6sgalhvj2-nix-image-my-redis.tar.drv'...
INFO[2024-02-24T14:56:08.201801181Z] Building image                                arch=amd64 base-image= os=linux
INFO[2024-02-24T14:56:08.201895919Z] Read runtime inputs from closure file         closure-count=86
/nix/store/l4c0q07kvsfln2knpr4ljl5165wlamq9-nix-image-my-redis.tar

# nerdctl load < ./result 
unpacking nix:0/nix/store/l4c0q07kvsfln2knpr4ljl5165wlamq9-nix-image-my-redis.tar:latest (sha256:9d0f077834e7b10e6bc1ed0fac231ad6a0c56454f6b3b5e2a7804281ce058100)...
Loaded image: nix:0/nix/store/l4c0q07kvsfln2knpr4ljl5165wlamq9-nix-image-my-redis.tar:latest

# nerdctl run nix:0/nix/store/l4c0q07kvsfln2knpr4ljl5165wlamq9-nix-image-my-redis.tar:latest
FATA[0000] failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/nix/store/4qzsy60wgy08q0gnchm3xk3d9kcg70n5-redis-7.2.4/bin/redis-server": stat /nix/store/4qzsy60wgy08q0gnchm3xk3d9kcg70n5-redis-7.2.4/bin/redis-server: no such file or directory: unknown

# ls -l /nix/store/4qzsy60wgy08q0gnchm3xk3d9kcg70n5-redis-7.2.4/bin/redis-server
-r-xr-xr-x 1 root root 3235984 Jan  1  1970 /nix/store/4qzsy60wgy08q0gnchm3xk3d9kcg70n5-redis-7.2.4/bin/redis-server

I've made sure that containerd and nix-snapshotter are restarted correctly and running with the right config (there seems to be something not quite right with the systemd units because systemctl stop --user containerd.service fails to stop containerd, but that's a different issue).

I must admit I'm slightly guessing at how to load and run the images - I'm not running any kubernetes - I really want to just run images directly for the moment.

msackman commented 6 months ago

EDIT: no, this isn't the issue - even with this "corrected" with a symlink, I still get the same error.

It could be that the config is wrong - path to nix-snapshotter.sock:

# cat /nix/store/1p5s0lnspwl3pssjwrcv0841rqwb4hdi-containerd-config-checked.toml
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
disable_apparmor = true
disable_cgroup = true
restrict_oom_score_adj = true
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/nix/store/4vs9njrcqhpcfva5hg02xk7dvy7yyrnp-cni-plugins-1.3.0/bin"
conf_dir = "/etc/cni/net.d"

[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
snapshotter = "nix"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = false

[[plugins."io.containerd.transfer.v1.local".unpack_config]]
platform = "linux/amd64"
snapshotter = "nix"

[proxy_plugins.nix]
address = "/run/nix-snapshotter/nix-snapshotter.sock"
type = "snapshot"

# ls -al /run/nix-snapshotter/nix-snapshotter.sock
ls: cannot access '/run/nix-snapshotter/nix-snapshotter.sock': No such file or directory

# ls -l /run/user/1000/nix-snapshotter/nix-snapshotter.sock 
srwxr-xr-x 1 matthew users 0 Feb 24 14:55 /run/user/1000/nix-snapshotter/nix-snapshotter.sock
msackman commented 6 months ago

Ok, so I've tweaked the containerd.toml, and am now running it manually via a copy of the containerd-rootless script.

containerd.toml:

version = 2
root = "/home/matthew/.local/share/containerd"
state = "/run/user/1000/containerd"

[grpc]
  address = "/run/user/1000/containerd/containerd.sock"

[plugins]
[plugins."io.containerd.grpc.v1.cri"]
disable_apparmor = true
disable_cgroup = true
restrict_oom_score_adj = true
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/nix/store/4vs9njrcqhpcfva5hg02xk7dvy7yyrnp-cni-plugins-1.3.0/bin"
conf_dir = "/etc/cni/net.d"

[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
snapshotter = "nix"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = false

[[plugins."io.containerd.transfer.v1.local".unpack_config]]
platform = "linux/amd64"
snapshotter = "nix"

[proxy_plugins.nix]
address = "/run/user/1000/nix-snapshotter/nix-snapshotter.sock"
type = "snapshot"

and I'm also now running containerd-rootless-child with --log-level=debug. So here's what I get when I try to run an image:

DEBU[2024-02-24T16:14:37.526352706Z] stat snapshot                                 key="sha256:a2748f51f82a99f0e107552f91b7e65f4ef696a63b0dca72f3e925e26d57f09b"
DEBU[2024-02-24T16:14:37.552565307Z] stat snapshot                                 key="sha256:a2748f51f82a99f0e107552f91b7e65f4ef696a63b0dca72f3e925e26d57f09b"
DEBU[2024-02-24T16:14:37.601798614Z] prepare view snapshot                         key=/tmp/initialC937901741 parent="sha256:a2748f51f82a99f0e107552f91b7e65f4ef696a63b0dca72f3e925e26d57f09b"
DEBU[2024-02-24T16:14:37.699697130Z] prepare snapshot                              key=42ed5a575cb8d46d67e23dc64e1965811bffd85ddcb0acc0bc919603b2242ff6 parent="sha256:a2748f51f82a99f0e107552f91b7e65f4ef696a63b0dca72f3e925e26d57f09b"
DEBU[2024-02-24T16:14:37.747212032Z] event published                               ns=default topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2024-02-24T16:14:37.751743964Z] get snapshot mounts                           key=42ed5a575cb8d46d67e23dc64e1965811bffd85ddcb0acc0bc919603b2242ff6
DEBU[2024-02-24T16:14:37.778462364Z] event published                               ns=default topic=/containers/create type=containerd.events.ContainerCreate
DEBU[2024-02-24T16:14:37.796464685Z] get snapshot mounts                           key=42ed5a575cb8d46d67e23dc64e1965811bffd85ddcb0acc0bc919603b2242ff6
DEBU[2024-02-24T16:14:37.807033471Z] shim bootstrap parameters                     address="unix:///run/containerd/s/4401f22d2f29227164a2090a2a7c014bee524341d9df132d69337e8327844c76" namespace=default protocol=ttrpc
time="2024-02-24T16:14:37.809876814Z" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
time="2024-02-24T16:14:37.809953809Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.pause\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2024-02-24T16:14:37.809967254Z" level=debug msg="registering ttrpc service" id=io.containerd.ttrpc.v1.pause
time="2024-02-24T16:14:37.809977393Z" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
time="2024-02-24T16:14:37.809988614Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2024-02-24T16:14:37.810061110Z" level=debug msg="registering ttrpc service" id=io.containerd.ttrpc.v1.task
time="2024-02-24T16:14:37.810120522Z" level=debug msg="serving api on socket" socket="[inherited from parent]"
time="2024-02-24T16:14:37.810133516Z" level=debug msg="starting signal loop" namespace=default path=/run/.ro3710623502/user/1000/containerd/io.containerd.runtime.v2.task/default/42ed5a575cb8d46d67e23dc64e1965811bffd85ddcb0acc0bc919603b2242ff6 pid=189768 runtime=io.containerd.runc.v2
DEBU[2024-02-24T16:14:38.030468168Z] failed to delete task                         error="rpc error: code = NotFound desc = container not created: not found" id=42ed5a575cb8d46d67e23dc64e1965811bffd85ddcb0acc0bc919603b2242ff6
INFO[2024-02-24T16:14:38.030696607Z] shim disconnected                             id=42ed5a575cb8d46d67e23dc64e1965811bffd85ddcb0acc0bc919603b2242ff6 namespace=default
WARN[2024-02-24T16:14:38.030743405Z] cleaning up after shim disconnected           id=42ed5a575cb8d46d67e23dc64e1965811bffd85ddcb0acc0bc919603b2242ff6 namespace=default
INFO[2024-02-24T16:14:38.030756619Z] cleaning up dead shim                         namespace=default
WARN[2024-02-24T16:14:38.042263134Z] cleanup warnings time="2024-02-24T16:14:38Z" level=debug msg="starting signal loop" namespace=default pid=189942 runtime=io.containerd.runc.v2
time="2024-02-24T16:14:38Z" level=warning msg="failed to read init pid file" error="open /run/.ro3710623502/user/1000/containerd/io.containerd.runtime.v2.task/default/42ed5a575cb8d46d67e23dc64e1965811bffd85ddcb0acc0bc919603b2242ff6/init.pid: no such file or directory" runtime=io.containerd.runc.v2  namespace=default
ERRO[2024-02-24T16:14:38.042749337Z] copy shim log                                 error="read /proc/self/fd/14: file already closed" namespace=default
DEBU[2024-02-24T16:14:38.060410337Z] event published                               ns=default topic=/containers/update type=containerd.events.ContainerUpdate
DEBU[2024-02-24T16:14:38.291548686Z] remove snapshot                               key=/tmp/initialC937901741 snapshotter=nix
DEBU[2024-02-24T16:14:38.315348853Z] schedule snapshotter cleanup                  snapshotter=nix
DEBU[2024-02-24T16:14:38.315496319Z] event published                               ns=default topic=/snapshot/remove type=containerd.events.SnapshotRemove
DEBU[2024-02-24T16:14:38.332161832Z] removed snapshot                              key=default/57//tmp/initialC937901741 snapshotter=nix
DEBU[2024-02-24T16:14:38.332441166Z] snapshot garbage collected                    d=17.067246ms snapshotter=nix
DEBU[2024-02-24T16:14:38.332466012Z] garbage collected                             d=24.218882ms

I'm quite suspicious of the shim bootstrap parameters address="unix:///run/containerd/s/4401f22d2f29227164a2090a2a7c014bee524341d9df132d69337e8327844c76 line. As well as the msg="starting signal loop" namespace=default path=/run/.ro3710623502/... line. Because /run is not world writable.

# ls -ld /run
drwxr-xr-x 29 root root 740 Feb 24 15:46 /run

That said, nerdctl run hello-world works just fine, and produces the same paths in the debug log msgs. So maybe it is more a nix-snapshotter issue.

elpdt852 commented 6 months ago

Nerdctl load doesn’t work atm because it doesn’t unpack with snapshotter labels, it’s actually a problem with the containerd library they are using. You should stick to nix-snapshotter’s “copyToContainerd” or use the “preload-containerd” service which does the right thing.

Sorry for the trouble! I will add a known issues section to the README.

msackman commented 6 months ago

Ahhh! Yes this does indeed work

# nix-build redis.nix -A copyToContainerd
this derivation will be built:
  /nix/store/pvb2ydq6by0jzzf4mrh3b984c50lh83i-copy-to-containerd.drv
building '/nix/store/pvb2ydq6by0jzzf4mrh3b984c50lh83i-copy-to-containerd.drv'...
/nix/store/4gwlrnn8sgcsmcz81cb9w5k46dvaqai1-copy-to-containerd

# ./result/bin/copy-to-containerd 
INFO[2024-02-25T08:36:29.727692311Z] Importing image                               ref="nix:0/nix/store/1h06hsldbffmszs3cih05j7zy1kfmbk7-nix-image-my-redis2.tar:latest"
INFO[2024-02-25T08:36:30.057373957Z] Creating image                                ref="nix:0/nix/store/1h06hsldbffmszs3cih05j7zy1kfmbk7-nix-image-my-redis2.tar:latest"
INFO[2024-02-25T08:36:30.068568014Z] Created image                                 ref="nix:0/nix/store/1h06hsldbffmszs3cih05j7zy1kfmbk7-nix-image-my-redis2.tar:latest"

# nerdctl run nix:0/nix/store/1h06hsldbffmszs3cih05j7zy1kfmbk7-nix-image-my-redis2.tar:latest
1:C 25 Feb 2024 08:36:37.344 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:C 25 Feb 2024 08:36:37.344 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 25 Feb 2024 08:36:37.344 * Redis version=7.2.4, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 25 Feb 2024 08:36:37.344 # Warning: no config file specified, using the default config. In order to specify a config file use /nix/store/4qzsy60wgy08q0gnchm3xk3d9kcg70n5-redis-7.2.4/bin/redis-server /path/to/redis.conf
1:M 25 Feb 2024 08:36:37.345 # You requested maxclients of 10000 requiring at least 10032 max file descriptors.
1:M 25 Feb 2024 08:36:37.345 # Server can't set maximum open files to 10032 because of OS error: Operation not permitted.
1:M 25 Feb 2024 08:36:37.345 # Current maximum open files is 1024. maxclients has been reduced to 992 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'.
1:M 25 Feb 2024 08:36:37.345 * monotonic clock: POSIX clock_gettime
1:M 25 Feb 2024 08:36:37.345 * Running mode=standalone, port=6379.
1:M 25 Feb 2024 08:36:37.345 * Server initialized
1:M 25 Feb 2024 08:36:37.345 * Ready to accept connections tcp

I'm quite sure that a lot of my problem is coming to this project without knowing the existing tools and patterns particularly well. But yes, a pointer or two in your own docs would undoubtedly help. Many thanks.