Closed Smeb closed 8 months ago
@Smeb any chance you can try out https://github.com/istio/istio/pull/48757?
TAG=test-sh HUB=localhost:5000
CGO_ENABLED=0 BUILD_WITH_CONTAINER=0 BUILD_ALL=false DOCKER_TARGETS=docker.install-cni make dockerx.pushx
kubectl -n istio-system set image daemonset/istio-cni-node install-cni=${HUB}/install-cni:${TAG}
can do it (replace hub, probably)
Awesome turnaround! Can do tomorrow (GMT). Will get back to you when I've had a chance to try it.
Runs into a permission issue when changing just the image and nothing else:
Warning FailedCreatePodSandBox 4m45s kubelet Failed to create pod sandbox: rpc error: code = Unknown
desc = failed to setup network for sandbox "250d4f5b075fbbbe9c8f29c67abd1073417782508dd0bded96c97aea6d787619":
plugin type="istio-cni" name="istio-cni" failed (add): fork/exec /opt/cni/bin/istio-cni: permission denied
Here are some logs from running on the node (just to show the binary is there, running the patched version, and executable):
bash-5.1# /opt/cni/bin/istio-cni
CNI plugin istio-cni 1.21-dev
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0
bash-5.1# /opt/cni/bin/istio-cni unshare
usage: unshare --lock-file=file -- <command> [args...]
bash-5.1# /opt/cni/bin/istio-cni unshare ls
bin boot dev etc home lib lib64 local lost+found media mnt opt proc root run sbin srv sys tmp usr var x86_64-bottlerocket-linux-gnu
Which looking in the kubelet logs is a result of
AVC avc: denied { mounton } for pid=2545767 comm="istio-cni" path="/" dev="dm-0" ino=2 scontext=system_u:system_r:container_t:s0 tcontext=system_u:object_r:os_t:s0 tclass=dir permissive=0
Something we can fix, but indicates that it may cause issues for other folks.
Something else I saw is that in the case where we get an error, the logging doesn't work from inside cmdAdd
. If I use the patch but just force it to work for our environment (by not doing the mount), then the logging works. I think this might be the fix.
For reference, this also occurs on 1.20.2
. We're running on BottleRocket nodes. Thanks for reporting this so quickly.
Which looking in the kubelet logs is a result of
So this is SELinux blocking any mount
calls, right? Do you have custom SELinux rules or is this default on bottlerocket?
We don't have any custom SELinux rules set, so it's the default (we're using aws-k8s-1.28/x86_64/latest
). The policies are here. I do see some CNI related rules here and one exception where they allow writes to /etc/resolve.conf
here - so maybe a change would be possible. I guess that might look something like (cni_exec_t etc_t (files (mount)))
(though there is also a rule that prevents writing dynamic files explicitly).
I can try to take a look at the configuration today. One other option might be using chroot
for the sandbox, because I think it should be possible to mount the files iptables needs in another context. I'll try and iterate a bit on your patch on our environment and see if I can get something that'll co-operate.
Small follow up - we also escalated this with AWS who are investigating. Might be they have a solution for the AWS Bottlerocket case that works for everyone.
I'm a little concerned that we don't have a solution (yet). Is this a behaviour we could gate behind a flag? Obviously the ideal would be to find something that we know works for all environments, but I'm a little concerned that might be tricky given the diversity of environments out there. I don't like the idea of complicating the configuration, but currently Bottlerocket users can't easily use the latest supported patch releases of the currently supported versions as is.
Regarding this error:
AVC avc: denied { mounton } for pid=2545767 comm="istio-cni" path="/" dev="dm-0" ino=2 scontext=system_u:system_r:container_t:s0 tcontext=system_u:object_r:os_t:s0 tclass=dir permissive=0
This is saying that istio-cni
is not allowed to mount something directly on the host's /
. (dm-0
refers to the dm-verity root filesystem which has the os_t
label in most places.) I don't think that's actually what you're trying to accomplish so this particular denial may not be worth exploring.
However, if you did try to mount over /etc/nsswitch.conf
, I expect you'd see a failure like this:
# made up example
AVC avc: denied { mounton } for pid=??? comm="istio-cni" path="nsswitch.conf" dev="tmpfs" ino=??? scontext=system_u:system_r:container_t:s0 tcontext=system_u:object_r:etc_t:s0 tclass=file permissive=0
From a security policy standpoint, it doesn't make sense to allow a process from an unprivileged container (the container_t
bit in this output indicates unprivileged) to overmount a sensitive host file like /etc/nsswitch.conf
. Bottlerocket could transition to a more privileged label when executing CNI plugins but that creates a different set of security concerns.
If the mount attempt were treated as optional, that would work for Bottlerocket, which has a very generic /etc/nsswitch.conf
that won't trigger network calls for UID lookups. Otherwise, ideally you'd be able to point glibc to a specific NSS config via environment variable or something, but I'm not aware of a way to do that except via LD_PRELOAD which carries its own set of drawbacks.
From a security policy standpoint, it doesn't make sense to allow a process from an unprivileged container
Ignoring selinux, is this true given we are unsharing the mount namespace? With this, the mount shouldn't impact other processes?
Even if thats true, selinux still may not take that into consideration, though.
Otherwise, ideally you'd be able to point glibc to a specific NSS config via environment variable or something, but I'm not aware of a way to do that except via LD_PRELOAD which carries its own set of drawbacks.
This is my top preference but I couldn't find a great solution (LD_PRELOAD didn't seem great).
Here is my current options in preference order:
I'll explore these a bit, but input from others welcome as well. I am trying to setup a bottlerocket env so I can reliably test each of these as well (if someone knows a way to run bottlerocket in docker like kind
-- let me know!).
In the meantime, I would recommend users on bottlerocket do not upgrade to 1.19.6/1.20.2. While I cannot make promises, I am confident we will implement at least option 3 by the next patch release (so you shouldn't need to worry about being stuck on an old version and some security patch comes out that is incompatible, for instance).
I got a bottlerocket setup and can reproduce
However, if you did try to mount over
/etc/nsswitch.conf
, I expect you'd see a failure like this:
type=1400 audit(1705537456.860:6): avc: denied { mounton } for pid=30072 comm="istio-cni" path="/etc/nsswitch.conf" dev="tmpfs" ino=117 scontext=system_u:system_r:container_t:s0 tcontext=system_u:object_r:etc_t:s0 tclass=file permissive=0
is the real log we end up seeing.
Parsing nsswitch.conf seems like the type of thing that might be error prone, so we could change "Come up with a way to detect if we need the mount, and only use it if so" to "Try to overwrite it, but do not exit on error". The downside is we will get these audit messages though
One big question I had was why can we not unshare(CLONE_NEWNS); mount ...
-- this is what containers do, and obviously containers run on bottlerocket.
I believe here we restrict CNI plugins: https://github.com/bottlerocket-os/bottlerocket/blob/7452c37e0b1abf19797a23bb7ab4d70bb0aef1cf/packages/selinux-policy/rules.cil#L90-L93 while the container runtime runs as another group (runtime_t) which has more access (maybe a bit off, first time dealing with SELinux rules...). So the CNI plugins essentially have less privilege than the container runtime?
One big question I had was why can we not unshare(CLONE_NEWNS); mount ... -- this is what containers do, and obviously containers run on bottlerocket. ... So the CNI plugins essentially have less privilege than the container runtime?
Yes, the container runtime is relatively privileged - it needs to be able to change the label of processes it launches, which is a bit like sudo
for mandatory access control mechanisms.
The answer to your first question is that SELinux is not namespace-aware, so it doesn't have a way to take the namespace into account when answering the question "does subject
"just" parse nsswitch.conf and see if its doing anything weird?
Go does this for "hosts" to determine which DNS resolver it should use; if it finds a mechanism it doesn't understand, it switches to the cgo resolver so that the system library is used instead.
In practice I expect you'll find that most Linux distros include an entry more complex than Bottlerocket's passwd: files
. On my Fedora development machine, it's passwd: files sss systemd
where sss
could trigger network-based lookups and systemd
wouldn't. There's no way to tell just from the name, so you'd need to maintain a list of known problematic cases that should trigger the overmount.
"Try to overwrite it, but do not exit on error". The downside is we will get these audit messages though
I wouldn't worry about the audit message very much if it's just one AVC denial when the pod starts up. That is pretty common in my experience, and relatively harmless if the software has a fallback path.
New iteration pushed up to https://github.com/istio/istio/pull/48757. This seems to work on GKE and Bottlerocket (where I have tested so far). Basically if we cannot setup the mounts we just continue anyways
Thanks for all the follow up work. I just ran the latest version in our environment and can confirm it works. We get a lot of audit messages (as expected), but otherwise everything looks good.
One warning to folks - between the regression in 1.18 and the fix, 1.18 is now EOL https://istio.io/latest/news/support/announcing-1.18-eol-final/. As such, this fix (and any other bugs or CVEs in Istio) will only be applied to 1.19+. 1.19 and 1.20 will be fixed in the next patch release (couple of weeks)
@howardjohn
One warning to folks - between the regression in 1.18 and the fix, 1.18 is now EOL https://istio.io/latest/news/support/announcing-1.18-eol-final/. As such, this fix (and any other bugs or CVEs in Istio) will only be applied to 1.19+. 1.19 and 1.20 will be fixed in the next patch release (couple of weeks)
Hi, can you link in which patch release versions does this fix exist?
I did try with istio v1.20.4
and with 1.21.0
and still have the same issue
2024-04-04T13:13:49.120967Z warn cni failed to setup execution environment, attempting to continue anyways: failed to remount /: permission denied
2024-04-04T13:13:49.120970Z info cni Running ip6tables-restore with the following input:
2024-04-04T13:13:49.120973Z info cni Running command (without lock by env and nss): ip6tables-restore --noflush
2024-04-04T13:13:49.120976Z warn cni failed to setup execution environment, attempting to continue anyways: failed to remount /: permission denied
2024-04-04T13:13:49.120978Z info cni Running command (without nss): iptables-save
2024-04-04T13:13:49.120981Z warn cni failed to setup execution environment, attempting to continue anyways: failed to remount /: permission denied
2024-04-04T13:13:49.120985Z info cni Command output:
# Generated by iptables-save v1.8.9 on Thu Apr 4 13:13:49 2024
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:ISTIO_INBOUND - [0:0]
:ISTIO_IN_REDIRECT - [0:0]
:ISTIO_OUTPUT - [0:0]
:ISTIO_REDIRECT - [0:0]
-A PREROUTING -p tcp -j ISTIO_INBOUND
-A OUTPUT -p tcp -j ISTIO_OUTPUT
-A ISTIO_INBOUND -p tcp -m tcp --dport 15008 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15020 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15021 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15090 -j RETURN
-A ISTIO_INBOUND -p tcp -j ISTIO_IN_REDIRECT
-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15006
@howardjohn
One warning to folks - between the regression in 1.18 and the fix, 1.18 is now EOL https://istio.io/latest/news/support/announcing-1.18-eol-final/. As such, this fix (and any other bugs or CVEs in Istio) will only be applied to 1.19+. 1.19 and 1.20 will be fixed in the next patch release (couple of weeks)
Hi, can you link in which patch release versions does this fix exist? I did try with istio
v1.20.4
and with1.21.0
and still have the same issue2024-04-04T13:13:49.120967Z warn cni failed to setup execution environment, attempting to continue anyways: failed to remount /: permission denied 2024-04-04T13:13:49.120970Z info cni Running ip6tables-restore with the following input: 2024-04-04T13:13:49.120973Z info cni Running command (without lock by env and nss): ip6tables-restore --noflush 2024-04-04T13:13:49.120976Z warn cni failed to setup execution environment, attempting to continue anyways: failed to remount /: permission denied 2024-04-04T13:13:49.120978Z info cni Running command (without nss): iptables-save 2024-04-04T13:13:49.120981Z warn cni failed to setup execution environment, attempting to continue anyways: failed to remount /: permission denied 2024-04-04T13:13:49.120985Z info cni Command output: # Generated by iptables-save v1.8.9 on Thu Apr 4 13:13:49 2024 *nat :PREROUTING ACCEPT [0:0] :INPUT ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :POSTROUTING ACCEPT [0:0] :ISTIO_INBOUND - [0:0] :ISTIO_IN_REDIRECT - [0:0] :ISTIO_OUTPUT - [0:0] :ISTIO_REDIRECT - [0:0] -A PREROUTING -p tcp -j ISTIO_INBOUND -A OUTPUT -p tcp -j ISTIO_OUTPUT -A ISTIO_INBOUND -p tcp -m tcp --dport 15008 -j RETURN -A ISTIO_INBOUND -p tcp -m tcp --dport 15020 -j RETURN -A ISTIO_INBOUND -p tcp -m tcp --dport 15021 -j RETURN -A ISTIO_INBOUND -p tcp -m tcp --dport 15090 -j RETURN -A ISTIO_INBOUND -p tcp -j ISTIO_IN_REDIRECT -A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15006
@howardjohn Is there a release with the fix? I see there are quite a few releases in the last weeks.
https://github.com/istio/istio/pull/48757
fixed in 1.21.3
Is this the right place to submit this?
Bug Description
We're running into an issue when we try to use Istio 1.19.6 on our EKS clusters. 1.19.5 works. We think it's because of this patch https://github.com/istio/istio/pull/48422. We're running on bottlerocket, which doesn't include a shell - that change means that
exec.Command("sh" ...)
is called on both code paths, while previously it was only run for older versions of iptables. This blocks us from upgrading to1.19.6
(and presumably1.20.2
which includes that logic).The error we get (kubelet logs):
Not sure on the best fix if this is the issue. One (slightly hacky) option would be for the CNI binary to call itself to set up the mini-container environment, since that way we know the binary exists and can run.
Version
Additional Information
No response