linkerd / linkerd2

Ultralight, security-first service mesh for Kubernetes. Main repo for Linkerd 2.x.
https://linkerd.io
Apache License 2.0
10.56k stars 1.27k forks source link

Unable to use Linkerd-CNI #7945

Open BobyMCbobs opened 2 years ago

BobyMCbobs commented 2 years ago

What is the issue?

When Linkerd is installed with CNI enabled, Pod sandboxes fail to create.

How can it be reproduced?

linkerd install-cni | kubectl apply -f -
linkerd install --linkerd-cni-enabled | kubectl apply -f -

Logs, error output, etc

  Normal   Scheduled               37s   default-scheduler  Successfully assigned linkerd/linkerd-destination-54c8fb86c8-gwz6k to talos-192-168-122-140
  Warning  FailedCreatePodSandBox  36s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "c0b4a8286046ccbfd565b4d74731bd12b43b5a6b5ad43558f5d3f30d198ad517": plugin type="linkerd-cni" name="linkerd-cni" failed (add): exec: "nsenter": executable file not found in $PATH
  Warning  FailedCreatePodSandBox  25s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "bb75adddf6aa1a08bf372c422257b5fcf70c5aa4d510a78f82c5c17f361b3c55": plugin type="linkerd-cni" name="linkerd-cni" failed (add): exec: "nsenter": executable file not found in $PATH
  Warning  FailedCreatePodSandBox  9s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5f159b05d91979644c44d84aa8f384295721e42e8802439649f6a9cbaeef7c2f": plugin type="linkerd-cni" name="linkerd-cni" failed (add): exec: "nsenter": executable file not found in $PATH

output of linkerd check -o short

Linkerd core checks
===================

linkerd-existence
-----------------
× control plane pods are ready
    No running pods for "linkerd-destination"
    see https://linkerd.io/2/checks/#l5d-api-control-ready for hints

Status check results are ×

Environment

Possible solution

No response

Additional context

Using Cilium as the CNI. Using Flannel makes no difference.

This happens both on amd64 in a VM and arm64 on Raspberry Pis.

My goal is to improve app start time by using the CNI plugin instead of the init containers.

If I run

linkerd upgrade --linkerd-cni-enabled=false | kubectl apply -f -

the CNI isn't used, and Linkerd Pods return back healthly.

Would you like to work on fixing this bug?

No response

mateiidavid commented 2 years ago

Hey @BobyMCbobs, thanks for raising this. Is util-linux package present on your hosts? Linker'ds CNI plugin will set-up iptable rules in a pod's network namespace. afaict, the network namespace will be passed in as an argument by the CNI runtime and it's how we know where to run the firewall configuration.

If you find it odd for Talos not to have nsenter, you can ssh on the host and do echo $PATH; which nsenter, perhaps the path has been set wrong? I unfortunately can't help out with a repro since we do not have access to any Talos hosts.

For reference, I found this and associated issues: https://github.com/talos-systems/talos/issues/4194, might be worth having a look through them?

BobyMCbobs commented 2 years ago

Hey @mateiidavid,

Thank you for your reply! Talos doesn't include nsenter on the host.

May I please have a link to how the CNI plugin uses nsenter?

olix0r commented 2 years ago

nsenter is invoked here https://github.com/linkerd/linkerd2-proxy-init/blob/a556ca400132106db279ce8c3a79003a766bf707/iptables/iptables.go#L212-L228 to wrap calls to iptables

BobyMCbobs commented 2 years ago

nsenter is invoked here https://github.com/linkerd/linkerd2-proxy-init/blob/a556ca400132106db279ce8c3a79003a766bf707/iptables/iptables.go#L212-L228 to wrap calls to iptables

Thank you, @olix0r!

BobyMCbobs commented 2 years ago

@olix0r, why is nsenter needed to call iptables?

I'm taking a look that the implementation, to expand on what you said: Is it correct that the CNI plugin uses nsenter on the host to exec into the network namespace of the Pod and set iptables rules in it?

https://github.com/linkerd/linkerd2/blob/main/cni-plugin/main.go#L241 -> https://github.com/linkerd/linkerd2-proxy-init/blob/a556ca400132106db279ce8c3a79003a766bf707/iptables/iptables.go#L212-L228

mateiidavid commented 2 years ago

Hey @BobyMCbobs, this is how I understand things. When a pod is scheduled on a node, the container runtime (CRI) is responsible for creating, starting and stopping the pod. After a pod is first created (i.e the CRI creates its sandbox -- in other words linux namespace -- and network namespace), its networking stack has to be created. For a pod to accept and send traffic without NAT, it needs to communicate with the host through a veth interface and get an IP address assigned to it.

The CNI does all of this, and more. A networking stack (or simply put network ns in our case) is first created as a blank canvas, there are no routes, no rules, no devices, they're all added in by the different plugins. CNIs simly configure everything.

In our case, we need to set up iptables and to do it, we need to enter the network namespace of the pod that's just been created. If we simply execute iptables commands without entering the namespace, they'll be applied to the host. So, to kind of directly answer the question, without using nsenter, there isn't really a way to guarantee we set the rules for the pod, since the agent/daemonset runs on the host. Does this sort of make sense? Feel free to correct any of my points if I'm wrong.

Now, on to the solution: I think we'd be open to bundling util-linux with our CNI plugin. I suspect it'd be pretty easy, just add util-linux in the Dockerfile.

https://github.com/linkerd/linkerd2/blob/a98b72285c795ab9e5237e10fbdaa7ec6fd4fcf3/cni-plugin/Dockerfile#L23-L26

There's no easy for us to test this solution with Talos so we'd need some additional help here, which is why we'd appreciate it a lot if you could contribute :D

To test, we could do the following:

  1. Fork, make changes to the image
  2. bin/docker-build to build the images
  3. Push to a registry, or if you can, push to your cluster's image registry
  4. Install linkerd-cni & linkerd, verify all works well (might have to override the registry here).

Wdyt?

BobyMCbobs commented 2 years ago

Hey @mateiidavid, Thank you for your reply.

Now, on to the solution: I think we'd be open to bundling util-linux with our CNI plugin. I suspect it'd be pretty easy, just add util-linux in the Dockerfile.

https://github.com/linkerd/linkerd2/blob/a98b72285c795ab9e5237e10fbdaa7ec6fd4fcf3/cni-plugin/Dockerfile#L23-L26

There's no easy for us to test this solution with Talos so we'd need some additional help here, which is why we'd appreciate it a lot if you could contribute :D

To test, we could do the following:

  1. Fork, make changes to the image
  2. bin/docker-build to build the images
  3. Push to a registry, or if you can, push to your cluster's image registry
  4. Install linkerd-cni & linkerd, verify all works well (might have to override the registry here).

Wdyt?

I gave this a go, and there doesn't appear to be any difference. Since the CNI runs on the host through the kubelet, it will depend on the host binaries. Took a look at how Cilium uses network namespaces and iptables:

I know for sure that if it were possible to have just the binary of the cni contain everything it needs, it would for sure work what ever the environment. I'll keep look around for what's possible.

Keen to have this work! I'm more than happy to contribute what I can!

mateiidavid commented 2 years ago

No probs :) to be clear, the way I understand CNIs: the plugin is a binary on the host that gets called by the kubelet, in that sense, our plugin will also call the iptables binary on the host, it doesn't do it in a container. We need to run it in the pod's network namespace though, which is different. I guess that's why the initial solution didn't really work. You're right that packaging it with the container won't work (unless we copy the binary on the host).

Cilium might have a different use case for iptables and firewall configuration, and perhaps that's why it is run in the host's namespace. For example, looking at the docs it seems to be used for kube-proxy interop (kube-proxy in most cases is just a big collection of iptables rules itself). It would make sense that in these scenarios you can run it in the host ns.

For us though, the usecase for iptables is different. We want to make sure that we set up routing rules for each pod's network ns in such a way that allows the proxy to take over packets -- we do not want, however, the host to have the same config -- running in the same network ns as the pods is a bit of a necessity afaik (and as far as I can tell).

We can programatically enter the namespace, as opposed to using the nsenter wrapper. I'm a bit apprehensive to go down this route though, I think we started using the wrapper for a good reason, the folks from weave published an article about Go not working extremely well with network namespaces here. Idk, maybe we can think of something here but I'd avoid it if we can.

Hm, with all this being said, not sure what we can do as a solution here. Our container that runs as an agent on the host is basically a bash script that copies over the plugin binary in the right location (and creates a network config file). Wonder if there's anything we can do in the install script 🤔

frezbo commented 2 years ago

assuming that the linkerd pod runs with CAP_NET_ADMIN it could directly do a nsenter from the pod itself to other other pods network ns, removing the need for nsenter to be present on the host. Is this a limitation due to how CNI binaries pass information through stdin/stdout. Trying to understand the the need for nsenter when it could be done from the pod itself

mateiidavid commented 2 years ago

@frezbo that's true, the CNI binary itself could enter the namespace programatically, however, there are two points to consider here:

Does this make sense and line up with what you know about the space? We'd still be very happy to fix this.

smira commented 2 years ago

On Go and Linux namespaces: this should not be a problem anymore with Go, e.g. it's possible to switch to some network namespace and perform actions:

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

Cowboy-coder commented 1 year ago

We also ran into this issue (with talos). Any chance that changing to use go for switching namespace https://github.com/linkerd/linkerd2/issues/7945#issuecomment-1138546766 would solve this issue?

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

blabu23 commented 1 year ago

seems to be the same issue as my problem described here: https://github.com/linkerd/linkerd2/issues/10413

kflynn commented 10 months ago

For the record, yup, we are looking into this...

NigelVanHattum commented 3 months ago

Any updates on this topic?

the-wondersmith commented 3 months ago

@BobyMCbobs @NigelVanHattum @Cowboy-coder Talos supports system extensions and in fact maintains an official extension for util-linux (the package that normally supplies the nsenter binary). The official extension does not currently include nsenter when it builds, but I've just opened a PR that should fix that.

Once the PR is merged, simply using the extension should resolve this issue.

the-wondersmith commented 3 months ago

@BobyMCbobs @NigelVanHattum @Cowboy-coder

Update: the PR for including nsenter in the util-linux extension has been merged 😁.

djryanj commented 2 months ago

Hey all, just wondering if any headway has been made here.

I'm running Talos 1.7.4 with the util-linux extension and trying to install linkerd-cni so I can avoid needing to mark all namespaces where linkerd needs to run as privileged, but the linkerd-network-validator container in linkerd fails to come up properly with logs like

2024-07-02T20:03:06.514474Z  INFO linkerd_network_validator: Listening for connections on 0.0.0.0:4140
2024-07-02T20:03:06.514493Z DEBUG linkerd_network_validator: token="<redacted>\n"
2024-07-02T20:03:06.514500Z  INFO linkerd_network_validator: Connecting to 1.1.1.1:20001
2024-07-02T20:03:06.514929Z DEBUG connect: linkerd_network_validator: Connected client.addr=10.244.1.51:34290
2024-07-02T20:03:16.515844Z ERROR linkerd_network_validator: Failed to validate networking configuration. Please ensure iptables rules are rewriting traffic as expected. timeout=10s

With the last line seemingly the biggest hint. I'm at a loss as to how to proceed here, and I haven't found a single thing anywhere explaining how I can get linkerd running on Talos.

djryanj commented 2 months ago

So I discovered that my problem was actually that I was using cilium and had set "cni.exclusive=false" in the helm chart install for that. This caused any attempted use of linkerd-cni to fail. As soon as I set that flag to true, linkerd-cni works in Talos as expected.

the-wondersmith commented 2 months ago

So I discovered that my problem was actually that I was using cilium and had set "cni.exclusive=false" in the helm chart install for that. This caused any attempted use of linkerd-cni to fail. As soon as I set that flag to true, linkerd-cni works in Talos as expected.

@BobyMCbobs with this verification, would you mind terribly marking this issue as resolved?

wmorgan commented 2 months ago

@djryanj Sorry you ran into that. We are learning about that (bizarre) flag for Cilium ourselves. We JUST merged a docs PR that mentions this so hopefully future Cilium + Linkerd + CNI users will be able to avoid the issue. https://github.com/linkerd/website/pull/1794