Closed msackman closed 6 months ago
EDIT: no, this isn't the issue - even with this "corrected" with a symlink, I still get the same error.
It could be that the config is wrong - path to nix-snapshotter.sock:
# cat /nix/store/1p5s0lnspwl3pssjwrcv0841rqwb4hdi-containerd-config-checked.toml
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
disable_apparmor = true
disable_cgroup = true
restrict_oom_score_adj = true
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/nix/store/4vs9njrcqhpcfva5hg02xk7dvy7yyrnp-cni-plugins-1.3.0/bin"
conf_dir = "/etc/cni/net.d"
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
snapshotter = "nix"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = false
[[plugins."io.containerd.transfer.v1.local".unpack_config]]
platform = "linux/amd64"
snapshotter = "nix"
[proxy_plugins.nix]
address = "/run/nix-snapshotter/nix-snapshotter.sock"
type = "snapshot"
# ls -al /run/nix-snapshotter/nix-snapshotter.sock
ls: cannot access '/run/nix-snapshotter/nix-snapshotter.sock': No such file or directory
# ls -l /run/user/1000/nix-snapshotter/nix-snapshotter.sock
srwxr-xr-x 1 matthew users 0 Feb 24 14:55 /run/user/1000/nix-snapshotter/nix-snapshotter.sock
Ok, so I've tweaked the containerd.toml, and am now running it manually via a copy of the containerd-rootless script.
containerd.toml
:
version = 2
root = "/home/matthew/.local/share/containerd"
state = "/run/user/1000/containerd"
[grpc]
address = "/run/user/1000/containerd/containerd.sock"
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
disable_apparmor = true
disable_cgroup = true
restrict_oom_score_adj = true
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/nix/store/4vs9njrcqhpcfva5hg02xk7dvy7yyrnp-cni-plugins-1.3.0/bin"
conf_dir = "/etc/cni/net.d"
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
snapshotter = "nix"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = false
[[plugins."io.containerd.transfer.v1.local".unpack_config]]
platform = "linux/amd64"
snapshotter = "nix"
[proxy_plugins.nix]
address = "/run/user/1000/nix-snapshotter/nix-snapshotter.sock"
type = "snapshot"
and I'm also now running containerd-rootless-child with --log-level=debug
. So here's what I get when I try to run an image:
DEBU[2024-02-24T16:14:37.526352706Z] stat snapshot key="sha256:a2748f51f82a99f0e107552f91b7e65f4ef696a63b0dca72f3e925e26d57f09b"
DEBU[2024-02-24T16:14:37.552565307Z] stat snapshot key="sha256:a2748f51f82a99f0e107552f91b7e65f4ef696a63b0dca72f3e925e26d57f09b"
DEBU[2024-02-24T16:14:37.601798614Z] prepare view snapshot key=/tmp/initialC937901741 parent="sha256:a2748f51f82a99f0e107552f91b7e65f4ef696a63b0dca72f3e925e26d57f09b"
DEBU[2024-02-24T16:14:37.699697130Z] prepare snapshot key=42ed5a575cb8d46d67e23dc64e1965811bffd85ddcb0acc0bc919603b2242ff6 parent="sha256:a2748f51f82a99f0e107552f91b7e65f4ef696a63b0dca72f3e925e26d57f09b"
DEBU[2024-02-24T16:14:37.747212032Z] event published ns=default topic=/snapshot/prepare type=containerd.events.SnapshotPrepare
DEBU[2024-02-24T16:14:37.751743964Z] get snapshot mounts key=42ed5a575cb8d46d67e23dc64e1965811bffd85ddcb0acc0bc919603b2242ff6
DEBU[2024-02-24T16:14:37.778462364Z] event published ns=default topic=/containers/create type=containerd.events.ContainerCreate
DEBU[2024-02-24T16:14:37.796464685Z] get snapshot mounts key=42ed5a575cb8d46d67e23dc64e1965811bffd85ddcb0acc0bc919603b2242ff6
DEBU[2024-02-24T16:14:37.807033471Z] shim bootstrap parameters address="unix:///run/containerd/s/4401f22d2f29227164a2090a2a7c014bee524341d9df132d69337e8327844c76" namespace=default protocol=ttrpc
time="2024-02-24T16:14:37.809876814Z" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
time="2024-02-24T16:14:37.809953809Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.pause\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2024-02-24T16:14:37.809967254Z" level=debug msg="registering ttrpc service" id=io.containerd.ttrpc.v1.pause
time="2024-02-24T16:14:37.809977393Z" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
time="2024-02-24T16:14:37.809988614Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2024-02-24T16:14:37.810061110Z" level=debug msg="registering ttrpc service" id=io.containerd.ttrpc.v1.task
time="2024-02-24T16:14:37.810120522Z" level=debug msg="serving api on socket" socket="[inherited from parent]"
time="2024-02-24T16:14:37.810133516Z" level=debug msg="starting signal loop" namespace=default path=/run/.ro3710623502/user/1000/containerd/io.containerd.runtime.v2.task/default/42ed5a575cb8d46d67e23dc64e1965811bffd85ddcb0acc0bc919603b2242ff6 pid=189768 runtime=io.containerd.runc.v2
DEBU[2024-02-24T16:14:38.030468168Z] failed to delete task error="rpc error: code = NotFound desc = container not created: not found" id=42ed5a575cb8d46d67e23dc64e1965811bffd85ddcb0acc0bc919603b2242ff6
INFO[2024-02-24T16:14:38.030696607Z] shim disconnected id=42ed5a575cb8d46d67e23dc64e1965811bffd85ddcb0acc0bc919603b2242ff6 namespace=default
WARN[2024-02-24T16:14:38.030743405Z] cleaning up after shim disconnected id=42ed5a575cb8d46d67e23dc64e1965811bffd85ddcb0acc0bc919603b2242ff6 namespace=default
INFO[2024-02-24T16:14:38.030756619Z] cleaning up dead shim namespace=default
WARN[2024-02-24T16:14:38.042263134Z] cleanup warnings time="2024-02-24T16:14:38Z" level=debug msg="starting signal loop" namespace=default pid=189942 runtime=io.containerd.runc.v2
time="2024-02-24T16:14:38Z" level=warning msg="failed to read init pid file" error="open /run/.ro3710623502/user/1000/containerd/io.containerd.runtime.v2.task/default/42ed5a575cb8d46d67e23dc64e1965811bffd85ddcb0acc0bc919603b2242ff6/init.pid: no such file or directory" runtime=io.containerd.runc.v2 namespace=default
ERRO[2024-02-24T16:14:38.042749337Z] copy shim log error="read /proc/self/fd/14: file already closed" namespace=default
DEBU[2024-02-24T16:14:38.060410337Z] event published ns=default topic=/containers/update type=containerd.events.ContainerUpdate
DEBU[2024-02-24T16:14:38.291548686Z] remove snapshot key=/tmp/initialC937901741 snapshotter=nix
DEBU[2024-02-24T16:14:38.315348853Z] schedule snapshotter cleanup snapshotter=nix
DEBU[2024-02-24T16:14:38.315496319Z] event published ns=default topic=/snapshot/remove type=containerd.events.SnapshotRemove
DEBU[2024-02-24T16:14:38.332161832Z] removed snapshot key=default/57//tmp/initialC937901741 snapshotter=nix
DEBU[2024-02-24T16:14:38.332441166Z] snapshot garbage collected d=17.067246ms snapshotter=nix
DEBU[2024-02-24T16:14:38.332466012Z] garbage collected d=24.218882ms
I'm quite suspicious of the shim bootstrap parameters address="unix:///run/containerd/s/4401f22d2f29227164a2090a2a7c014bee524341d9df132d69337e8327844c76
line. As well as the msg="starting signal loop" namespace=default path=/run/.ro3710623502/...
line. Because /run is not world writable.
# ls -ld /run
drwxr-xr-x 29 root root 740 Feb 24 15:46 /run
That said, nerdctl run hello-world
works just fine, and produces the same paths in the debug log msgs. So maybe it is more a nix-snapshotter issue.
Nerdctl load doesn’t work atm because it doesn’t unpack with snapshotter labels, it’s actually a problem with the containerd library they are using. You should stick to nix-snapshotter’s “copyToContainerd” or use the “preload-containerd” service which does the right thing.
Sorry for the trouble! I will add a known issues section to the README.
Ahhh! Yes this does indeed work
# nix-build redis.nix -A copyToContainerd
this derivation will be built:
/nix/store/pvb2ydq6by0jzzf4mrh3b984c50lh83i-copy-to-containerd.drv
building '/nix/store/pvb2ydq6by0jzzf4mrh3b984c50lh83i-copy-to-containerd.drv'...
/nix/store/4gwlrnn8sgcsmcz81cb9w5k46dvaqai1-copy-to-containerd
# ./result/bin/copy-to-containerd
INFO[2024-02-25T08:36:29.727692311Z] Importing image ref="nix:0/nix/store/1h06hsldbffmszs3cih05j7zy1kfmbk7-nix-image-my-redis2.tar:latest"
INFO[2024-02-25T08:36:30.057373957Z] Creating image ref="nix:0/nix/store/1h06hsldbffmszs3cih05j7zy1kfmbk7-nix-image-my-redis2.tar:latest"
INFO[2024-02-25T08:36:30.068568014Z] Created image ref="nix:0/nix/store/1h06hsldbffmszs3cih05j7zy1kfmbk7-nix-image-my-redis2.tar:latest"
# nerdctl run nix:0/nix/store/1h06hsldbffmszs3cih05j7zy1kfmbk7-nix-image-my-redis2.tar:latest
1:C 25 Feb 2024 08:36:37.344 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:C 25 Feb 2024 08:36:37.344 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 25 Feb 2024 08:36:37.344 * Redis version=7.2.4, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 25 Feb 2024 08:36:37.344 # Warning: no config file specified, using the default config. In order to specify a config file use /nix/store/4qzsy60wgy08q0gnchm3xk3d9kcg70n5-redis-7.2.4/bin/redis-server /path/to/redis.conf
1:M 25 Feb 2024 08:36:37.345 # You requested maxclients of 10000 requiring at least 10032 max file descriptors.
1:M 25 Feb 2024 08:36:37.345 # Server can't set maximum open files to 10032 because of OS error: Operation not permitted.
1:M 25 Feb 2024 08:36:37.345 # Current maximum open files is 1024. maxclients has been reduced to 992 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'.
1:M 25 Feb 2024 08:36:37.345 * monotonic clock: POSIX clock_gettime
1:M 25 Feb 2024 08:36:37.345 * Running mode=standalone, port=6379.
1:M 25 Feb 2024 08:36:37.345 * Server initialized
1:M 25 Feb 2024 08:36:37.345 * Ready to accept connections tcp
I'm quite sure that a lot of my problem is coming to this project without knowing the existing tools and patterns particularly well. But yes, a pointer or two in your own docs would undoubtedly help. Many thanks.
I've checked that the PR in https://github.com/pdtpartners/nix-snapshotter/pull/128 does not fix this, so I believe this not the same as https://github.com/pdtpartners/nix-snapshotter/issues/104 but I've failed to come up with a better name, sorry.
home-manager, rootless, and with gvisor not enabled (though gvisor doesn't make any difference).
./redis.nix
:I've made sure that containerd and nix-snapshotter are restarted correctly and running with the right config (there seems to be something not quite right with the systemd units because
systemctl stop --user containerd.service
fails to stop containerd, but that's a different issue).I must admit I'm slightly guessing at how to load and run the images - I'm not running any kubernetes - I really want to just run images directly for the moment.