Closed itoffshore closed 1 year ago
So you're running it as root but in an unprivileged container? How would we detect that in order to enable this feature gate automatically? You're welcome to enable the feature gate yourself if you're running K3s in an odd configuration like this that requires it. I'm honestly not convinced this is something we're doing wrong?
checking /proc/1/environ
shows container=lxc
inside an LXD container
how do I manually enable k8 feature gates ?
--kubelet-arg=feature-gates=KubeletInUserNamespace=true
- same for kube-controller-manager, kube-apiserver, etc.
https://rancher.com/docs/k3s/latest/en/installation/install-options/server-config/#customized-flags
How would you tell that it's an unprivileged container?
It's only possible to check if the LXD container is unprivileged from outside the container. From inside the container it's only possible to check if it's in a container or not.
running the service on zfs
with:
k3s server --snapshotter=fuse-overlayfs --kubelet-arg=feature-gates=KubeletInUserNamespace=true --kube-controller-manager-arg=feature-gates=KubeletInUserNamespace=true --kube-apiserver-arg=feature-gates=KubeletInUserNamespace=true
Brings up all of the servers:
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 1641/k3s server
tcp 0 0 127.0.0.1:10249 0.0.0.0:* LISTEN 1641/k3s server
tcp 0 0 127.0.0.1:6444 0.0.0.0:* LISTEN 1641/k3s server
tcp 0 0 127.0.0.1:10256 0.0.0.0:* LISTEN 1641/k3s server
tcp 0 0 127.0.0.1:10257 0.0.0.0:* LISTEN 1641/k3s server
tcp 0 0 127.0.0.1:10258 0.0.0.0:* LISTEN 1641/k3s server
tcp 0 0 127.0.0.1:10259 0.0.0.0:* LISTEN 1641/k3s server
tcp6 0 0 :::10250 :::* LISTEN 1641/k3s server
tcp6 0 0 :::10251 :::* LISTEN 1641/k3s server
tcp6 0 0 :::6443 :::* LISTEN 1641/k3s server
k3s
in rootful mode inside unprivileged LXD the flannel
interface comes up but the cni0
interface is missing.k3s
rootless inside unprivileged LXD the container also loses connectivity I think this is a problem with the default LXD Linux Bridge. Using VXLAN networking with LXD + openvswitch is probably required to make unprivileged LXD work fully with k3s
Many thanks for looking at this issue - now I just have to fix the networking (it looks like my forwarding rules on the host need to be less strict)
Just to be clear - do you think there's anything K3s can do better? It sounds like there's not any way for us to detect unprivileged operation, so users will need to be responsible for setting the feature-gates on their own.
I was thinking about this earlier - I think it will be useful for LXD to create a file somewhere in containers to show unprivileged operation (so software running inside it can configure itself accordingly)
I will suggest it as an LXD feature & see what they think.
Inside LXD /proc/self/uid_map
& /proc/self/gid_map
can be checked:
Privileged LXD (root maps to root):
# cat /proc/self/uid_map
0 0 4294967295
# cat /proc/self/gid_map
0 0 4294967295
Unprivileged LXD (root maps to user namespace):
# cat /proc/self/gid_map
0 1000000 1000000000
# cat /proc/self/uid_map
0 1000000 1000000000
These values can also be read by the rootless
user:
Connected to the local host. Press ^] three times within 1s to exit session.
starting: dbus
podman@u2110:~$ cat /proc/self/gid_map
0 1000000 1000000000
podman@u2110:~$ cat /proc/self/uid_map
0 1000000 1000000000
That just sounds like user and group remapping; is there anything unique that can be used to identify unprivileged operation?
if uid 0
maps to anything other than 0
you are in an unprivileged container
Running unprivileged LXD with lvm
as the storage driver (which uses ext4
by default) makes the filesystem problems disappear in rootless
mode:
In rootless
mode I see OOM
warnings (one per minute):
container_manager_linux.go:675] "Failed to ensure state" containerName="/k3s" err="failed to apply oom score -999 to PID 30: write /proc/30/oom_score_adj: permission denied"
In rootful
mode inside unprivileged LXD (with the userspace feature gates enabled) all the ports seem to come up:
[root@u2110 ~]# ns | grep k3s
tcp 0 0 127.0.0.1:6444 0.0.0.0:* LISTEN 9097/k3s server
tcp 0 0 127.0.0.1:10256 0.0.0.0:* LISTEN 9097/k3s server
tcp 0 0 127.0.0.1:10257 0.0.0.0:* LISTEN 9097/k3s server
tcp 0 0 127.0.0.1:10258 0.0.0.0:* LISTEN 9097/k3s server
tcp 0 0 127.0.0.1:10259 0.0.0.0:* LISTEN 9097/k3s server
tcp 0 0 0.0.0.0:31164 0.0.0.0:* LISTEN 9097/k3s server
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 9097/k3s server
tcp 0 0 127.0.0.1:10249 0.0.0.0:* LISTEN 9097/k3s server
tcp 0 0 0.0.0.0:30250 0.0.0.0:* LISTEN 9097/k3s server
tcp6 0 0 :::10250 :::* LISTEN 9097/k3s server
tcp6 0 0 :::10251 :::* LISTEN 9097/k3s server
tcp6 0 0 :::6443 :::* LISTEN 9097/k3s server
The flannel.1
& cni0
interfaces both come up in rootful
mode - but do not go down when the k3s
service is stopped (which makes the container take a long time to stop)
I think we can unconditionally set KubeletInUserNamespace
feature gate without detecting whether we are in LXD.
(When we are outside userns, the feature gate is safely ignored)
OK sounds good ;o)
k3s
run as root
inside an unprivileged Ubuntu 21.10 LXD container (with nesting enabled) seems to work ok on both zfs
& lvm
(ext4):
Using ufw
as the host iptables
firewall with a libvirt virbr0
bridge works
Using nftables
on the host & inside LXD also works in rootful
& rootless
modes
Debian 11.1
unprivileged LXD containers also work with nftables
:
the system-upgrade-controller
also works in rootful
& rootless
modes with v0.8.0
- change to the docs proposed:
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.
no problems in ubuntu-22.04
LXD containers either with an ext4 zvol
mounted for containerd
As of LXD 5.6
(& until kernel 5.19
) - you also need to add to /etc/environment
:
LXD_IDMAPPED_MOUNTS_DISABLE=1
for overlayfs
/ stargz
snapshotters to work on lvm
storage volumes. At the moment with the latest k3s
I use in the service script:
ExecStart=/usr/local/bin/k3s \
server --snapshotter=stargz \
--kubelet-arg=feature-gates=KubeletInUserNamespace=true \
--kube-controller-manager-arg=feature-gates=KubeletInUserNamespace=true \
--kube-apiserver-arg=feature-gates=KubeletInUserNamespace=true \
--disable=servicelb --cluster-init
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.
Hi @itoffshore, would you mind sharing the full commands/settings that you used to setup the LXD container and then K3s inside it? I've tried to replicate your commands in an Ubuntu 22.04 container in LXD 5.0.2, but it doesn't seem to work when starting up K3s. For example:
...
"Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting \"proc\" to rootfs at \"/proc\": mount proc:/proc (via /proc/self/fd/6), flags: 0xe: permission denied: unknown" pod="kube-system/helm-install-traefik-crd-zsnr5"
...
Did you use a particular LXD container profile for example?
@dalbani - this profile should work using lvm
as the LXD backing store (NB: I used nftables
for my firewall - so if you use iptables
your kernel_modules
in your profile may need to be slightly different:
config:
limits.cpu: "2"
limits.memory: 2GB
limits.memory.swap: "false"
linux.kernel_modules: ip_vs,ip_vs_rr,ip_vs_wrr,ip_vs_sh,nf_tables,netlink_diag,nf_nat,overlay
raw.lxc: |-
lxc.apparmor.profile=unconfined
lxc.mount.auto=proc:rw sys:rw cgroup:rw
lxc.cgroup.devices.allow=a
lxc.cap.drop=
security.nesting: "true"
security.privileged: "false"
description: K3s LXD profile
devices:
eth0:
name: eth0
network: lxdbr0
type: nic
root:
path: /
pool: default
propagation: shared
type: disk
name: k3s
I also made k3s
work in unprivileged LXD on zfs
if I created an ext4 zvol
& mounted it inside the container under /var/lib/rancher
(wherever kubelet
runs it expects an ext4
filesystem) - possibly in the agent
subdirectory of /var/lib/rancher
?
You should probably start with lvm
until you get it working - also note the service script settings above. Everything seemed to work - I even had the stargz
snapshotter working.
I successfully ran k3s
under LXD on lvm
/ zfs
on Ubuntu 22.04 / & zfs
on Arch Linux (although I expect both to work)
Thanks @itoffshore, I've indeed managed to run K3s within an unprivileged container, using storage from a ZFS pool being "delegated" (zoned=on
).
I'm curious though what changes have been applied to the K3s codebase to be able to mark this issue as completed, as it happened a couple of weeks ago?
And how does that relate to the so-called "rootless mode" (e.g. commit https://github.com/k3s-io/k3s/commit/6e8284e3d4d3595824ffb5c6fa305a1dd9aa9274)?
@dalbani - this is probably why the issue was closed. Thanks for the new rootless
note.
Environmental Info:
K3s Version:
Node(s) CPU architecture, OS, and Version:
Cluster Configuration:
zfs
)Describe the bug:
KubeletInUserNamespace
is not set in unprivileged LXD containers whenk3s
is run asroot
Steps To Reproduce:
k3s
service inside unprivileged LXD containerExpected behavior:
Actual behavior:
Additional context / logs:
Backporting
Trying to run k3s
rootless
inside unprivileged LXD onzfs
is problematic (btrfs
gives a similar error):(This causes sandbox creation to fail)
This error disappears when running rootful
k3s
inside unprivileged LXD but the service fails due toKubeletInUserNamespace
feature gate not being enabled.An easy way to check if running inside a container is to check
/proc/1/environ
which containscontainer=lxc
inside LXD containers.