Closed MagicRB closed 6 months ago
I found this in containerd
https://github.com/containerd/containerd/pull/6899, it still doesn't solve my problem though as I need to run both OCI and nix native images in one Pod
. Which I don't think is possible as of now.
A possible way to hack this together would be container (not pod) annotations and then patching this part of containerd: https://github.com/containerd/containerd/blob/main/pkg/cri/server/container_create.go#L189 (the PR I linked earlier seems to have completely vanished from containerd afaict)
plugins."io.containerd.grpc.v1.cri".containerd = {
snapshotter = "nix";
image-service-endpoint = "unix:///run/nix-snapshotter/nix-snapshotter.sock";
disable_snapshot_annotations = false;
};
plugins."io.containerd.transfer.v1.local".unpack_config = [
{
platform = "${GOOS}/${GOARCH}";
snapshotter = "nix";
}
];
proxy_plugins.nix = {
type = "snapshot";
address = "/run/nix-snapshotter/nix-snapshotter.sock";
};
Even with that config I cannot seem to get k3s
to cooperate for any containers, nix or OCI, I keep getting
failed to create containerd container: failed to create snapshot: missing parent "k8s.io/8/sha256:1021ef88c7974bfff89c5a0ec4fd3160daac6c48a075f74cff721f85dd104e68" bucket: not found
Which is also exactly what I get with the runtimeSnapshotter
support in containerd
.
I'm on a wild debugging streak right now, sorry for detracting this issue but I can't make heads or tails of this. The missing parent
issue was resolved by just resetting the whole k3s
cluster due to some weird state issue. Now I'm hitting that even with --log-level debug
I only see log lines beginning with [image-service]
never [nix-snapshotter]
. Furthermore the snapshot directory gets created correctly but all the directories are empty, I also straced the nix-snapshotter
process and I don't see any mount
syscalls happening.
I haven't tested this theory, but is it possible I'm running into https://github.com/pdtpartners/nix-snapshotter/issues/129 ? I will test pre loading the images later today that did not help, tried with nix2container load
but it still won't exec properly
Is it possible to have both work at the same time inside one kubelet?
Yes nix-snapshotter is backwards compatible, so you should be able to resolve and run regular OCI images and nix-snapshotter images in the same pod. The same image can also have interleaved regular OCI layers & nix-snapshotter layers.
For the server-side, we just configure the kubelet directly via: https://github.com/pdtpartners/nix-snapshotter/blob/main/modules/nixos/tests/kubernetes.nix#L36
Note that Kubernetes doesn't need to know about nix-snapshotter other than it being an CRI image service. When the kubelet eventually asks containerd to spawn a container, containerd knows which snapshotter to use based on plugins."io.containerd.grpc.v1.cri".containerd.snapshotter
:
https://github.com/pdtpartners/nix-snapshotter/blob/main/modules/common/containerd.nix#L66-L68
Using --snapshotter
or CONTAINERD_SNAPSHOTTER
with either ctr
and nerdctl
is purely client-side when not using the GRPC interface directly (as opposed to using the CRI interface via crictl
or what the kubelet does).
whole k3s cluster due to some weird state issue
Note that rootless k3s doesn't support nix-snapshotter yet: https://github.com/pdtpartners/nix-snapshotter/issues/120, but rootful k3s is working.
I also straced the nix-snapshotter process and I don't see any mount syscalls happening.
Do you have the nix CLI available in the PATH for the nix-snapshotter process?
I'm on a wild debugging streak right now
It would help to have some kind of reproduction case, it seems like there's many moving pieces while you're making it fit with NixNG.
Do you have the nix CLI available in the PATH for the nix-snapshotter process?
Yes, I'll verify. verified
It would help to have some kind of reproduction case, it seems like there's many moving pieces while you're making it fit with NixNG.
There is way too many part moving yes, its hard for me to produce a reproducible example. I am trying to find something in the logs, pointing me to the bit i missed. But so far im having no luck.
It's as if the nix-snapshotter
code path does even trigger, it does end up in its nix
directory so it does go through there. But just passes straight down to the backup overlayfs
.
Note that rootless k3s doesn't support nix-snapshotter yet: https://github.com/pdtpartners/nix-snapshotter/issues/120, but rootful k3s is working.
I'm running rootful, always have been. The problem seems to have arisen when I added nix-snapshotter to an existing containerd and k3s.
Both of the links you've provided I've already incorporated. And I've verified in the logs that both containerd
and the kubelet
know and use nix-snapshotter
. I've verified k3s and therefore containerd correctly call into nix-snapshotter
. So it must be somewhere in it, where it doesn't actually trigger. I'm trying to make sense of how nix-snapshotter
works so I can figure out why it doesn't seem to trigger at all in my environment.
Make sure you’re using this containerd: https://github.com/pdtpartners/nix-snapshotter/blob/main/modules/flake/overlays.nix#L8.
If you aren’t seeing mounts, it must be having trouble either:
If you could provide a gist with containerd, nix-snapshotter logs, as well as “kubectl describe pod xyz”, that’ll help as well.
Ah, I'm using the internal k3s containerd, could that be the culprit? I'll provide the logs later today
Oh and after I am done, I'll start a draft of the manual install doc, at least for the k3s rootful situation.
I found the bug. I wasn't using the k3s
from this flake. I'm trying to fix that currently. It's a deeper bug in NixNG somewhere. For some reason assert (pkgs.k3s == cfg.package); foo
fails, where
package = mkPackageOption pkgs "k3s" {};
as to why it fails, is beyond me. Going through pkgs
direct I ge tthe version from the overlay. Going through the option I end up with the default nixpkgs
version.
EDIT: fixed, I have a weird thing how I pass through options from a NixOS module to the underlying NixNG module. I just copied the option definition which proceeded to use the pkgs
from the overarching NixOS system not the NixNG container system... I am now recompiling k3s.
On a side note, I am working on a NixOS module, which inside a systemd-nspawn
spins up a completely self contained instance of NixNG with k3s, postgres and all the rest. The idea there is to have everything in one nice network namespace so it doesn't pollute the host as much.
Yes that makes sense. Sorry I should’ve been more clear, rootless only doesn’t work because it can only use its embedded containerd, I.e. rootful only works with external containerd.
This repo provides overlays for both containerd & k3s so embedded & external all work, but we’re still working on upstreaming these bug fixes.
yeah, I got it. I think the take away here is that I need to help with the manual install doc :) and that once you get this running, you CANNOT switch or play in any way shape or form with the snapshotters. I had to reset the state of containerd multiple times. If it was throwing weird errors, I reset it.
While we're here, would there be any interest in the NixNG code? As the author, I would be very happy if I found someone who had an interest in it. I stand behind the fact that distroless is nice, until it isn't. In my experience most things do not work and NixNG is as distroless as it can be.
And thanks for the help, I'll finalize my modules and then draft the docs for this little journey :)
As nix-snapshotter stabilizes, I’m moving in the direction of upstreaming NixOS modules, Home Manager modules, etc. This repo is only incubating the changes, so overlays won’t be necessary later. So I rather the NixNG repo be the source of truth for nix-snapshotter NixNG modules.
If you can provide reproduction for snapshotter instability, happy to take a look.
I've managed to pull apart this flake and make it fit into NixNG and the rest. I have figured out that
makes normal OCI images work in kubernetes, while
makes
nix:0/nix/store/...
image refs work. Is it possible to have both work at the same time inside onekubelet
? I know that on the command line I can specify--snapshotter
orCONTAINERD_SNAPSHOTTER
and set them tonix
which will again make nix native containers work, but I'm not sure how to specify that in kubernetes.