linkerd / linkerd2

Ultralight, security-first service mesh for Kubernetes. Main repo for Linkerd 2.x.
https://linkerd.io
Apache License 2.0
10.64k stars 1.28k forks source link

iptables command failing in linkerd-init container #7749

Closed ichundu closed 2 years ago

ichundu commented 2 years ago

What is the issue?

iptables command fails in linkerd-init container (in all linkerd pods) with the error message shown in the log snippet below. linkerd installation is not successful.

How can it be reproduced?

Logs, error output, etc

linkerd-init container log:

2022/01/31 21:47:56 Tracing this script execution as [1643665676]
2022/01/31 21:47:56 current state
------------------------------------------------------------
2022/01/31 21:47:56 :; iptables-save
2022/01/31 21:47:56 

2022/01/31 21:47:56 configuration
------------------------------------------------------------
2022/01/31 21:47:56 Will ignore port [4190 4191 4567 4568] on chain PROXY_INIT_REDIRECT
2022/01/31 21:47:56 Will redirect all INPUT ports to proxy
2022/01/31 21:47:56 Ignoring uid 2102
2022/01/31 21:47:57 Will ignore port [443] on chain PROXY_INIT_OUTPUT
2022/01/31 21:47:57 Redirecting all OUTPUT to 4140
2022/01/31 21:47:57 

2022/01/31 21:47:57 adding rules
------------------------------------------------------------
2022/01/31 21:47:57 :; iptables -t nat -N PROXY_INIT_REDIRECT -m comment --comment proxy-init/redirect-common-chain/1643665676
2022/01/31 21:47:57 modprobe: can't change directory to '/lib/modules': No such file or directory
modprobe: can't change directory to '/lib/modules': No such file or directory
iptables v1.8.7 (legacy): can't initialize iptables table `nat': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.

2022/01/31 21:47:57 Aborting firewall configuration
Error: exit status 3
Usage:
  proxy-init [flags]

Flags:
  -h, --help                               help for proxy-init
      --inbound-ports-to-ignore strings    Inbound ports and/or port ranges (inclusive) to ignore and not redirect to proxy. This has higher precedence than any other parameters.
  -p, --incoming-proxy-port int            Port to redirect incoming traffic (default -1)
      --netns string                       Optional network namespace in which to run the iptables commands
      --outbound-ports-to-ignore strings   Outbound ports and/or port ranges (inclusive) to ignore and not redirect to proxy. This has higher precedence than any other parameters.
  -o, --outgoing-proxy-port int            Port to redirect outgoing traffic (default -1)
  -r, --ports-to-redirect ints             Port to redirect to proxy, if no port is specified then ALL ports are redirected
  -u, --proxy-uid int                      User ID that the proxy is running under. Any traffic coming from this user will be ignored to avoid infinite redirection loops. (default -1)
      --simulate                           Don't execute any command, just print what would be executed
      --timeout-close-wait-secs int        Sets nf_conntrack_tcp_timeout_close_wait
  -w, --use-wait-flag                      Appends the "-w" flag to the iptables commands

output of linkerd check --pre command:

Linkerd core checks
===================

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

pre-kubernetes-setup
--------------------
√ control plane namespace does not already exist
√ can create non-namespaced resources
√ can create ServiceAccounts
√ can create Services
√ can create Deployments
√ can create CronJobs
√ can create ConfigMaps
√ can create Secrets
√ can read Secrets
√ can read extension-apiserver-authentication configmap
√ no clock skew detected

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

Status check results are √

output of linkerd check -o short

Linkerd core checks
===================

linkerd-existence
-----------------
/ No running pods for "linkerd-destination"

Environment

Possible solution

No response

Additional context

I have tried many combinations in my setup to but I always get the same error. I'm using cri-o as container runtime but have also tried with containerd and get the exact same result. I have tried versions 2.10, 2.11 and edge of linkerd without success. Some months ago I was able to get the linkerd up and running succesfully with the same setup (except kubernetes version was 1.20). I have tried installing from cli following the getting started guide and also official helm charts.

Would you like to work on fixing this bug?

No response

mateiidavid commented 2 years ago

Hey @ichundu,

I would have assumed this is an issue with the host rather than with a specific Kubernetes version.

Some months ago I was able to get the linkerd up and running succesfully with the same setup (except kubernetes version was 1.20).

Was this also on Oracle Linux 8.5? Looking at their documentation it seems that they don't use iptables as their firewall, have you tried following the steps in the docs to enable it on your host machines?

I'd first to try and see if enabling iptables explictly helps. If it does, then we know where the problem is :)

ichundu commented 2 years ago

Hi @mateiidavid,

The documentation you linked is for RHEL 7 and explains the usage of ipset and iptables, I don't think it is relevant for my problem. I think the iptables binary is installed in the linkerd-init container image

Was this also on Oracle Linux 8.5? It may have been 8.4, I don't remember exactly. Oracle Linux is "nearly" identical to RHEL.

After some further research I think I have narrowed down the issue. The linkerd-init container is running image cr.l5d.io/linkerd/proxy-init:v1.4.0 which includes the legacy iptables binary. Red Hat 8 family of distributions (RHEL, CentOS, RockyLinux, AlmaLinux, Oracle Linux) use nf_tables instead and iptables is just a wrapper around nf_tables, see here. When RHEL 8 was released this caused issues with docker and other CNI plugins like calico that relied on legacy iptables but since then they have added support for iptables (nf_tables) on RHEL 8.

If this is the issue it means that linkerd-init container cannot work in any RHEL 8 based cluster even in Openshift which uses nftables. I found this other issue: Openshift 4.5 Install Fails on IPTables which has a similar problem in Openshift and the suggestion there is to use linkerd-cni.

I tried installing linkerd-cni daemon set, which removes the need for linkerd-init container and it is installing successfuly.

If you don't plan adding support for Red Hat distros without CNI plugin, you can close this issue.

Thank you!

mateiidavid commented 2 years ago

Hey @ichundu, sorry about the resources, it was a pretty quick skim on my side. Great links attached btw, thanks for being so thorough.

Ah, that makes complete sense, re: using iptables-legacy. I knew there was a difference, but assumed all kernels would work with iptables-legacy, it seems that I was wrong. I think turning that on in the kernel is going to be too complicated, I'd be happy to find a workaround for this.

Something that puzzled me, perhaps you have some insight. When using a CNI, does the CNI itself translate the iptables queries to nf_tables? Or does it use something akin to iptables in nf mode? I'd suppose the latter, but I wanted to double check. Perhaps the workaround could be as simple as using the iptables nf mode frontend in our init container to keep changes minimal but also ensure our forwarding rules work well in all modern kernels. I think the trend is to move away from iptables, it's just too big of a change from my pov to be worth it.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

tonychoe commented 2 years ago

Hope this gets fixed. Otherwise, we either have to abondon Linkerd which we love or have to keep using Oracle Linux 7. :)

mateiidavid commented 2 years ago

@tonychoe hey, we're working on improving the CNI plugin at the moment. We are considering re-writing the iptables queries to nftables, that should allow the init container to run Oracle Linux hosts. Have you considered using the CNI plugin as an alternative?

tonychoe commented 2 years ago

@mateiidavid thanks for the suggestion. I've started using the CNI plugin which worked well on both Oracle Linux 7 and 8!

olix0r commented 2 years ago

In Linkerd 2.12, you can now configure proxy-init to use iptables-nft: https://linkerd.io/2.12/features/nft/