istio / istio

Connect, secure, control, and observe services.
https://istio.io
Apache License 2.0
35.72k stars 7.7k forks source link

[CNI] CNI binary for 1.19.6 crashes if `sh` is not on the host #48746

Closed Smeb closed 8 months ago

Smeb commented 8 months ago

Is this the right place to submit this?

Bug Description

We're running into an issue when we try to use Istio 1.19.6 on our EKS clusters. 1.19.5 works. We think it's because of this patch https://github.com/istio/istio/pull/48422. We're running on bottlerocket, which doesn't include a shell - that change means that exec.Command("sh" ...) is called on both code paths, while previously it was only run for older versions of iptables. This blocks us from upgrading to 1.19.6 (and presumably 1.20.2 which includes that logic).

The error we get (kubelet logs):

Warning   FailedCreatePodSandBox   Pod/httpbin-1-94bf7fddd-5m4zh     Failed to create pod sandbox: rpc error: code = Unknown
desc = failed to setup network for sandbox "809561b2f41924b90f7299a5b26c67189fa6c4527dcc7ab0cb7f59c5266a933b": plugin
type="istio-cni" name="istio-cni" failed (add): exec: "sh": executable file not found in $PATH

Not sure on the best fix if this is the issue. One (slightly hacky) option would be for the CNI binary to call itself to set up the mini-container environment, since that way we know the binary exists and can run.

Version

$ istioctl version --context sandbox20 (sandbox 20 is a testing cluster)
client version: 1.17.2
control plane version: 1.19.6
data plane version: 1.18.5 (6 proxies), 1.19.6 (2 proxies)

-> the mix of proxies is because 2 1.19.6 proxies were able to run since they run in excluded namespaces and don't reach the issue in the CNI logic.

Additional Information

No response

howardjohn commented 8 months ago

@Smeb any chance you can try out https://github.com/istio/istio/pull/48757?

TAG=test-sh HUB=localhost:5000
CGO_ENABLED=0 BUILD_WITH_CONTAINER=0 BUILD_ALL=false DOCKER_TARGETS=docker.install-cni make dockerx.pushx
kubectl -n istio-system set image daemonset/istio-cni-node install-cni=${HUB}/install-cni:${TAG}

can do it (replace hub, probably)

Smeb commented 8 months ago

Awesome turnaround! Can do tomorrow (GMT). Will get back to you when I've had a chance to try it.

Smeb commented 8 months ago

Runs into a permission issue when changing just the image and nothing else:

Warning  FailedCreatePodSandBox  4m45s    kubelet  Failed to create pod sandbox: rpc error: code = Unknown
desc = failed to setup network for sandbox "250d4f5b075fbbbe9c8f29c67abd1073417782508dd0bded96c97aea6d787619": 
plugin type="istio-cni" name="istio-cni" failed (add): fork/exec /opt/cni/bin/istio-cni: permission denied

Here are some logs from running on the node (just to show the binary is there, running the patched version, and executable):

bash-5.1# /opt/cni/bin/istio-cni
CNI plugin istio-cni 1.21-dev
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0
bash-5.1# /opt/cni/bin/istio-cni unshare
usage: unshare --lock-file=file -- <command> [args...]
bash-5.1# /opt/cni/bin/istio-cni unshare ls
bin  boot  dev  etc  home  lib  lib64  local  lost+found  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var  x86_64-bottlerocket-linux-gnu

Which looking in the kubelet logs is a result of

AVC avc:  denied  { mounton } for  pid=2545767 comm="istio-cni" path="/" dev="dm-0" ino=2 scontext=system_u:system_r:container_t:s0 tcontext=system_u:object_r:os_t:s0 tclass=dir permissive=0

Something we can fix, but indicates that it may cause issues for other folks.

Smeb commented 8 months ago

Something else I saw is that in the case where we get an error, the logging doesn't work from inside cmdAdd. If I use the patch but just force it to work for our environment (by not doing the mount), then the logging works. I think this might be the fix.

diranged commented 8 months ago

For reference, this also occurs on 1.20.2. We're running on BottleRocket nodes. Thanks for reporting this so quickly.

howardjohn commented 8 months ago

Which looking in the kubelet logs is a result of

So this is SELinux blocking any mount calls, right? Do you have custom SELinux rules or is this default on bottlerocket?

Smeb commented 8 months ago

We don't have any custom SELinux rules set, so it's the default (we're using aws-k8s-1.28/x86_64/latest). The policies are here. I do see some CNI related rules here and one exception where they allow writes to /etc/resolve.conf here - so maybe a change would be possible. I guess that might look something like (cni_exec_t etc_t (files (mount))) (though there is also a rule that prevents writing dynamic files explicitly).

I can try to take a look at the configuration today. One other option might be using chroot for the sandbox, because I think it should be possible to mount the files iptables needs in another context. I'll try and iterate a bit on your patch on our environment and see if I can get something that'll co-operate.

Smeb commented 8 months ago

Small follow up - we also escalated this with AWS who are investigating. Might be they have a solution for the AWS Bottlerocket case that works for everyone.

I'm a little concerned that we don't have a solution (yet). Is this a behaviour we could gate behind a flag? Obviously the ideal would be to find something that we know works for all environments, but I'm a little concerned that might be tricky given the diversity of environments out there. I don't like the idea of complicating the configuration, but currently Bottlerocket users can't easily use the latest supported patch releases of the currently supported versions as is.

bcressey commented 8 months ago

Regarding this error:

AVC avc:  denied  { mounton } for  pid=2545767 comm="istio-cni" path="/" dev="dm-0" ino=2 scontext=system_u:system_r:container_t:s0 tcontext=system_u:object_r:os_t:s0 tclass=dir permissive=0

This is saying that istio-cni is not allowed to mount something directly on the host's /. (dm-0 refers to the dm-verity root filesystem which has the os_t label in most places.) I don't think that's actually what you're trying to accomplish so this particular denial may not be worth exploring.

However, if you did try to mount over /etc/nsswitch.conf, I expect you'd see a failure like this:

# made up example
AVC avc:  denied  { mounton } for  pid=??? comm="istio-cni" path="nsswitch.conf" dev="tmpfs" ino=??? scontext=system_u:system_r:container_t:s0 tcontext=system_u:object_r:etc_t:s0 tclass=file permissive=0

From a security policy standpoint, it doesn't make sense to allow a process from an unprivileged container (the container_t bit in this output indicates unprivileged) to overmount a sensitive host file like /etc/nsswitch.conf. Bottlerocket could transition to a more privileged label when executing CNI plugins but that creates a different set of security concerns.

If the mount attempt were treated as optional, that would work for Bottlerocket, which has a very generic /etc/nsswitch.conf that won't trigger network calls for UID lookups. Otherwise, ideally you'd be able to point glibc to a specific NSS config via environment variable or something, but I'm not aware of a way to do that except via LD_PRELOAD which carries its own set of drawbacks.

howardjohn commented 8 months ago

From a security policy standpoint, it doesn't make sense to allow a process from an unprivileged container

Ignoring selinux, is this true given we are unsharing the mount namespace? With this, the mount shouldn't impact other processes?

Even if thats true, selinux still may not take that into consideration, though.

Otherwise, ideally you'd be able to point glibc to a specific NSS config via environment variable or something, but I'm not aware of a way to do that except via LD_PRELOAD which carries its own set of drawbacks.

This is my top preference but I couldn't find a great solution (LD_PRELOAD didn't seem great).


Here is my current options in preference order:

  1. Come up with a way to make the mount work universally. It seems plausible this isn't going to happen
  2. Come up with a way to detect if we need the mount, and only use it if so. This seems fairly straightforward, "just" parse nsswitch.conf and see if its doing anything weird?
  3. Explicit user configuration to control whether we need to mess with nsswitch.conf as well. This is my least favorite option because it will mean bottlerocket (or similar) users are broken and will need to follow some document to configure things manually, which is a pain.

I'll explore these a bit, but input from others welcome as well. I am trying to setup a bottlerocket env so I can reliably test each of these as well (if someone knows a way to run bottlerocket in docker like kind -- let me know!).

In the meantime, I would recommend users on bottlerocket do not upgrade to 1.19.6/1.20.2. While I cannot make promises, I am confident we will implement at least option 3 by the next patch release (so you shouldn't need to worry about being stuck on an old version and some security patch comes out that is incompatible, for instance).

howardjohn commented 8 months ago

I got a bottlerocket setup and can reproduce

However, if you did try to mount over /etc/nsswitch.conf, I expect you'd see a failure like this:

type=1400 audit(1705537456.860:6): avc: denied { mounton } for pid=30072 comm="istio-cni" path="/etc/nsswitch.conf" dev="tmpfs" ino=117 scontext=system_u:system_r:container_t:s0 tcontext=system_u:object_r:etc_t:s0 tclass=file permissive=0 is the real log we end up seeing.


Parsing nsswitch.conf seems like the type of thing that might be error prone, so we could change "Come up with a way to detect if we need the mount, and only use it if so" to "Try to overwrite it, but do not exit on error". The downside is we will get these audit messages though

howardjohn commented 8 months ago

One big question I had was why can we not unshare(CLONE_NEWNS); mount ... -- this is what containers do, and obviously containers run on bottlerocket.

I believe here we restrict CNI plugins: https://github.com/bottlerocket-os/bottlerocket/blob/7452c37e0b1abf19797a23bb7ab4d70bb0aef1cf/packages/selinux-policy/rules.cil#L90-L93 while the container runtime runs as another group (runtime_t) which has more access (maybe a bit off, first time dealing with SELinux rules...). So the CNI plugins essentially have less privilege than the container runtime?

bcressey commented 8 months ago

One big question I had was why can we not unshare(CLONE_NEWNS); mount ... -- this is what containers do, and obviously containers run on bottlerocket. ... So the CNI plugins essentially have less privilege than the container runtime?

Yes, the container runtime is relatively privileged - it needs to be able to change the label of processes it launches, which is a bit like sudo for mandatory access control mechanisms.

The answer to your first question is that SELinux is not namespace-aware, so it doesn't have a way to take the namespace into account when answering the question "does subject have permission to perform action on target ?" For this case, we might want to say "yes, as long as that action doesn't happen in the initial mount namespace" but there's no way to express that constraint.

"just" parse nsswitch.conf and see if its doing anything weird?

Go does this for "hosts" to determine which DNS resolver it should use; if it finds a mechanism it doesn't understand, it switches to the cgo resolver so that the system library is used instead.

In practice I expect you'll find that most Linux distros include an entry more complex than Bottlerocket's passwd: files. On my Fedora development machine, it's passwd: files sss systemd where sss could trigger network-based lookups and systemd wouldn't. There's no way to tell just from the name, so you'd need to maintain a list of known problematic cases that should trigger the overmount.

"Try to overwrite it, but do not exit on error". The downside is we will get these audit messages though

I wouldn't worry about the audit message very much if it's just one AVC denial when the pod starts up. That is pretty common in my experience, and relatively harmless if the software has a fallback path.

howardjohn commented 8 months ago

New iteration pushed up to https://github.com/istio/istio/pull/48757. This seems to work on GKE and Bottlerocket (where I have tested so far). Basically if we cannot setup the mounts we just continue anyways

Smeb commented 8 months ago

Thanks for all the follow up work. I just ran the latest version in our environment and can confirm it works. We get a lot of audit messages (as expected), but otherwise everything looks good.

howardjohn commented 7 months ago

One warning to folks - between the regression in 1.18 and the fix, 1.18 is now EOL https://istio.io/latest/news/support/announcing-1.18-eol-final/. As such, this fix (and any other bugs or CVEs in Istio) will only be applied to 1.19+. 1.19 and 1.20 will be fixed in the next patch release (couple of weeks)

ajaykumarmandapati commented 5 months ago

@howardjohn

One warning to folks - between the regression in 1.18 and the fix, 1.18 is now EOL https://istio.io/latest/news/support/announcing-1.18-eol-final/. As such, this fix (and any other bugs or CVEs in Istio) will only be applied to 1.19+. 1.19 and 1.20 will be fixed in the next patch release (couple of weeks)

Hi, can you link in which patch release versions does this fix exist? I did try with istio v1.20.4 and with 1.21.0 and still have the same issue

2024-04-04T13:13:49.120967Z warn    cni failed to setup execution environment, attempting to continue anyways: failed to remount /: permission denied
2024-04-04T13:13:49.120970Z info    cni Running ip6tables-restore with the following input:
2024-04-04T13:13:49.120973Z info    cni Running command (without lock by env and nss): ip6tables-restore --noflush
2024-04-04T13:13:49.120976Z warn    cni failed to setup execution environment, attempting to continue anyways: failed to remount /: permission denied
2024-04-04T13:13:49.120978Z info    cni Running command (without nss): iptables-save
2024-04-04T13:13:49.120981Z warn    cni failed to setup execution environment, attempting to continue anyways: failed to remount /: permission denied
2024-04-04T13:13:49.120985Z info    cni Command output: 
# Generated by iptables-save v1.8.9 on Thu Apr  4 13:13:49 2024
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:ISTIO_INBOUND - [0:0]
:ISTIO_IN_REDIRECT - [0:0]
:ISTIO_OUTPUT - [0:0]
:ISTIO_REDIRECT - [0:0]
-A PREROUTING -p tcp -j ISTIO_INBOUND
-A OUTPUT -p tcp -j ISTIO_OUTPUT
-A ISTIO_INBOUND -p tcp -m tcp --dport 15008 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15020 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15021 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15090 -j RETURN
-A ISTIO_INBOUND -p tcp -j ISTIO_IN_REDIRECT
-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15006
ajaykumarmandapati commented 4 months ago

@howardjohn

One warning to folks - between the regression in 1.18 and the fix, 1.18 is now EOL https://istio.io/latest/news/support/announcing-1.18-eol-final/. As such, this fix (and any other bugs or CVEs in Istio) will only be applied to 1.19+. 1.19 and 1.20 will be fixed in the next patch release (couple of weeks)

Hi, can you link in which patch release versions does this fix exist? I did try with istio v1.20.4 and with 1.21.0 and still have the same issue

2024-04-04T13:13:49.120967Z   warn    cni failed to setup execution environment, attempting to continue anyways: failed to remount /: permission denied
2024-04-04T13:13:49.120970Z   info    cni Running ip6tables-restore with the following input:
2024-04-04T13:13:49.120973Z   info    cni Running command (without lock by env and nss): ip6tables-restore --noflush
2024-04-04T13:13:49.120976Z   warn    cni failed to setup execution environment, attempting to continue anyways: failed to remount /: permission denied
2024-04-04T13:13:49.120978Z   info    cni Running command (without nss): iptables-save
2024-04-04T13:13:49.120981Z   warn    cni failed to setup execution environment, attempting to continue anyways: failed to remount /: permission denied
2024-04-04T13:13:49.120985Z   info    cni Command output: 
# Generated by iptables-save v1.8.9 on Thu Apr  4 13:13:49 2024
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:ISTIO_INBOUND - [0:0]
:ISTIO_IN_REDIRECT - [0:0]
:ISTIO_OUTPUT - [0:0]
:ISTIO_REDIRECT - [0:0]
-A PREROUTING -p tcp -j ISTIO_INBOUND
-A OUTPUT -p tcp -j ISTIO_OUTPUT
-A ISTIO_INBOUND -p tcp -m tcp --dport 15008 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15020 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15021 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15090 -j RETURN
-A ISTIO_INBOUND -p tcp -j ISTIO_IN_REDIRECT
-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15006

@howardjohn Is there a release with the fix? I see there are quite a few releases in the last weeks.

howardjohn commented 4 months ago

https://github.com/istio/istio/pull/48757

fixed in 1.21.3