kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
110.58k stars 39.53k forks source link

kube-proxy does not appear to be creating iptables entries #123837

Closed cerebrate closed 6 months ago

cerebrate commented 7 months ago

What happened?

When setting up a new cluster using Kubernetes 1.29.2 on Debian 12.5 ("bookworm"), it appears that the necessary iptables entries to permit access to services, etc., are not being created by kube-proxy. Upon reaching the step in setting up the first control-plane node, post kubeadm init, at which it is necessary to add a networking option, the pods of the networking option invariably fail complaining that it is impossible to reach the Kubernetes API server via the Kubernetes service.

Things at this point appear normal except for the failed networking option pod:

❯ kubectl get node
NAME                STATUS   ROLES           AGE   VERSION
princess-celestia   Ready    control-plane   55m   v1.29.2

❯ kubectl get pod -A
NAMESPACE     NAME                                        READY   STATUS              RESTARTS       AGE
kube-system   coredns-76f75df574-62dx4                    0/1     ContainerCreating   0              55m
kube-system   coredns-76f75df574-m6dhg                    0/1     ContainerCreating   0              55m
kube-system   etcd-princess-celestia                      1/1     Running             3 (23m ago)    55m
kube-system   kube-apiserver-princess-celestia            1/1     Running             3 (22m ago)    55m
kube-system   kube-controller-manager-princess-celestia   1/1     Running             2 (23m ago)    55m
kube-system   kube-proxy-7pwnn                            1/1     Running             0              21m
kube-system   kube-scheduler-princess-celestia            1/1     Running             2 (23m ago)    55m
kube-system   weave-net-mnm4v                             1/2     CrashLoopBackOff    24 (25s ago)   50m

This example is from Weave, but the equivalent error also occurs in Flannel, leading me to conclude that the issue is not with them:

FATA: 2024/03/09 23:25:51.801940 [kube-peers] Could not get peers: Get "https://[fdc9:b01a:cafe:60::1]:443/api/v1/nodes": dial tcp [fdc9:b01a:cafe:60::1]:443: connect: network is unreachable
Failed to get peers

The kube-proxy pod log shows no calls to iptables:

I0309 23:20:38.208141       1 server_others.go:72] "Using iptables proxy"
I0309 23:20:38.212206       1 server.go:1050] "Successfully retrieved node IP(s)" IPs=["172.16.0.129"]
I0309 23:20:38.213216       1 conntrack.go:58] "Setting nf_conntrack_max" nfConntrackMax=262144
I0309 23:20:38.220948       1 server.go:652] "kube-proxy running in dual-stack mode" primary ipFamily="IPv4"
I0309 23:20:38.220959       1 server_others.go:168] "Using iptables Proxier"
I0309 23:20:38.221908       1 proxier.go:245] "Setting route_localnet=1 to allow node-ports on localhost; to change this either disable iptables.localhostNodePorts (--iptables-localhost-nodeports) or set nodePortAddresses (--nodeport-addresses) to filter loopback addresses"
I0309 23:20:38.221976       1 server.go:865] "Version info" version="v1.29.2"
I0309 23:20:38.221981       1 server.go:867] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0309 23:20:38.222805       1 config.go:315] "Starting node config controller"
I0309 23:20:38.222817       1 shared_informer.go:311] Waiting for caches to sync for node config
I0309 23:20:38.222829       1 config.go:188] "Starting service config controller"
I0309 23:20:38.222837       1 shared_informer.go:311] Waiting for caches to sync for service config
I0309 23:20:38.222939       1 config.go:97] "Starting endpoint slice config controller"
I0309 23:20:38.222943       1 shared_informer.go:311] Waiting for caches to sync for endpoint slice config
I0309 23:20:38.323006       1 shared_informer.go:318] Caches are synced for endpoint slice config
I0309 23:20:38.323015       1 shared_informer.go:318] Caches are synced for node config
I0309 23:20:38.323020       1 shared_informer.go:318] Caches are synced for service config

and while the chains and some relevant entries are seen, the essential ones appear to be missing, per the following output from ip6tables-save and iptables-save:

root@princess-celestia:~# ip6tables-save
# Generated by ip6tables-save v1.8.9 (nf_tables) on Sat Mar  9 17:33:03 2024
*mangle
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:KUBE-IPTABLES-HINT - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-PROXY-CANARY - [0:0]
COMMIT
# Completed on Sat Mar  9 17:33:03 2024
# Generated by ip6tables-save v1.8.9 (nf_tables) on Sat Mar  9 17:33:03 2024
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:KUBE-EXTERNAL-SERVICES - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-FORWARD - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-PROXY-CANARY - [0:0]
:KUBE-PROXY-FIREWALL - [0:0]
:KUBE-SERVICES - [0:0]
-A INPUT -j KUBE-FIREWALL
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes load balancer firewall" -j KUBE-PROXY-FIREWALL
-A INPUT -m comment --comment "kubernetes health check service ports" -j KUBE-NODEPORTS
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes load balancer firewall" -j KUBE-PROXY-FIREWALL
-A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes load balancer firewall" -j KUBE-PROXY-FIREWALL
-A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-SERVICES -d fdc9:b01a:cafe:60::a/128 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp has no endpoints" -m tcp --dport 53 -j REJECT --reject-with icmp6-port-unreachable
-A KUBE-SERVICES -d fdc9:b01a:cafe:60::a/128 -p tcp -m comment --comment "kube-system/kube-dns:metrics has no endpoints" -m tcp --dport 9153 -j REJECT --reject-with icmp6-port-unreachable
-A KUBE-SERVICES -d fdc9:b01a:cafe:60::a/128 -p udp -m comment --comment "kube-system/kube-dns:dns has no endpoints" -m udp --dport 53 -j REJECT --reject-with icmp6-port-unreachable
COMMIT
# Completed on Sat Mar  9 17:33:03 2024
# Generated by ip6tables-save v1.8.9 (nf_tables) on Sat Mar  9 17:33:03 2024
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-POSTROUTING - [0:0]
:KUBE-PROXY-CANARY - [0:0]
:KUBE-SEP-ZW3YEZJQTUKK7ANJ - [0:0]
:KUBE-SERVICES - [0:0]
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m mark ! --mark 0x4000/0x4000 -j RETURN
-A KUBE-POSTROUTING -j MARK --set-xmark 0x4000/0x0
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE --random-fully
-A KUBE-SEP-ZW3YEZJQTUKK7ANJ -s fdc9:b01a:9d26:0:8aae:ddff:fe0a:99d8/128 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-ZW3YEZJQTUKK7ANJ -p tcp -m comment --comment "default/kubernetes:https" -m tcp -j DNAT --to-destination [fdc9:b01a:9d26:0:8aae:ddff:fe0a:99d8]:6443
-A KUBE-SERVICES -d fdc9:b01a:cafe:60::1/128 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES ! -d ::1/128 -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-NPX46M4PTMTKRN6Y ! -s fdc9:b01a:cafe::/56 -d fdc9:b01a:cafe:60::1/128 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https -> [fdc9:b01a:9d26:0:8aae:ddff:fe0a:99d8]:6443" -j KUBE-SEP-ZW3YEZJQTUKK7ANJ
COMMIT
# Completed on Sat Mar  9 17:33:03 2024
root@princess-celestia:~# iptables-save
# Generated by iptables-save v1.8.9 (nf_tables) on Sat Mar  9 17:33:35 2024
*mangle
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:KUBE-IPTABLES-HINT - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-PROXY-CANARY - [0:0]
COMMIT
# Completed on Sat Mar  9 17:33:35 2024
# Generated by iptables-save v1.8.9 (nf_tables) on Sat Mar  9 17:33:35 2024
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:KUBE-EXTERNAL-SERVICES - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-FORWARD - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-PROXY-CANARY - [0:0]
:KUBE-PROXY-FIREWALL - [0:0]
:KUBE-SERVICES - [0:0]
:WEAVE-NPC - [0:0]
:WEAVE-NPC-DEFAULT - [0:0]
:WEAVE-NPC-EGRESS - [0:0]
:WEAVE-NPC-EGRESS-ACCEPT - [0:0]
:WEAVE-NPC-EGRESS-CUSTOM - [0:0]
:WEAVE-NPC-EGRESS-DEFAULT - [0:0]
:WEAVE-NPC-INGRESS - [0:0]
-A INPUT -j KUBE-FIREWALL
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes load balancer firewall" -j KUBE-PROXY-FIREWALL
-A INPUT -m comment --comment "kubernetes health check service ports" -j KUBE-NODEPORTS
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes load balancer firewall" -j KUBE-PROXY-FIREWALL
-A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes load balancer firewall" -j KUBE-PROXY-FIREWALL
-A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A KUBE-FIREWALL ! -s 127.0.0.0/8 -d 127.0.0.0/8 -m comment --comment "block incoming localnet connections" -m conntrack ! --ctstate RELATED,ESTABLISHED,DNAT -j DROP
-A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A WEAVE-NPC -m state --state RELATED,ESTABLISHED -j ACCEPT
-A WEAVE-NPC -d 224.0.0.0/4 -j ACCEPT
-A WEAVE-NPC -m physdev --physdev-out vethwe-bridge --physdev-is-bridged -j ACCEPT
-A WEAVE-NPC -m state --state NEW -j WEAVE-NPC-DEFAULT
-A WEAVE-NPC -m state --state NEW -j WEAVE-NPC-INGRESS
-A WEAVE-NPC-EGRESS -m state --state RELATED,ESTABLISHED -j ACCEPT
-A WEAVE-NPC-EGRESS -m physdev --physdev-in vethwe-bridge --physdev-is-bridged -j RETURN
-A WEAVE-NPC-EGRESS -m addrtype --dst-type LOCAL -j RETURN
-A WEAVE-NPC-EGRESS -d 224.0.0.0/4 -j RETURN
-A WEAVE-NPC-EGRESS -m state --state NEW -j WEAVE-NPC-EGRESS-DEFAULT
-A WEAVE-NPC-EGRESS -m state --state NEW -m mark ! --mark 0x40000/0x40000 -j WEAVE-NPC-EGRESS-CUSTOM
-A WEAVE-NPC-EGRESS-ACCEPT -j MARK --set-xmark 0x40000/0x40000
COMMIT
# Completed on Sat Mar  9 17:33:35 2024
# Generated by iptables-save v1.8.9 (nf_tables) on Sat Mar  9 17:33:35 2024
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-POSTROUTING - [0:0]
:KUBE-PROXY-CANARY - [0:0]
:KUBE-SERVICES - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m mark ! --mark 0x4000/0x4000 -j RETURN
-A KUBE-POSTROUTING -j MARK --set-xmark 0x4000/0x0
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE --random-fully
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
COMMIT
# Completed on Sat Mar  9 17:33:35 2024

What did you expect to happen?

Once kubeadm init had completed, installation of a network option should proceed and complete normally; it (and other pods) should be able to access the kubernetes service.

How can we reproduce it (as minimally and precisely as possible)?

Rather than repeat the details of every command:

Take a vanilla, minimal Debian 12.5 installation, add containerd as the runtime, and then kubeadm init. Specifically, I use the cluster configuration file:

---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration

localAPIEndpoint:
  advertiseAddress: fdc9:b01a:9d26:0:8aae:ddff:fe0a:99d8

nodeRegistration:
  taints: []

---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: 1.29.2

clusterName: harmony

apiServer:
  certSANs:
    - "princess-celestia.arkane-systems.lan"
    - "172.16.0.129"
  timeoutForControlPlane: 4m0s

etcd:
  local:
    dataDir: /var/lib/etcd

networking:
  dnsDomain: cluster.local
  serviceSubnet: fdc9:b01a:cafe:60::/112,10.96.0.0/16
  podSubnet: fdc9:b01a:cafe:f4::/56,10.244.0.0/16

to set up for IPv6 networking, using the command kubeadm init --config ./cluster.conf, although using different subnet configurations makes no difference.

Anything else we need to know?

No response

Kubernetes version

```console $ kubectl version Client Version: v1.29.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.29.2 ```

Cloud provider

None.

OS version

```console # On Linux: $ cat /etc/os-release PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_ID="12" VERSION="12 (bookworm)" VERSION_CODENAME=bookworm ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/" $ uname -a Linux princess-celestia 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux ```

Install tools

None except `kubeadm`.

Container runtime (CRI) and version (if applicable)

# containerd --version containerd containerd.io 1.6.28 ae07eda36dd25f8a1b98dfbf587313b99c0190bb

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented 7 months ago

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
cerebrate commented 7 months ago

/sig network

cerebrate commented 7 months ago

Clarification: this does work using a default IPv4 configuration. It's only bringing IPv6 into it that seems to make it fail.

aojea commented 7 months ago

Can you validate from the nodes that plain connectivity works and you are able to connect to the apiserver? First try the advertised address

curl -k -v https://[fdc9:b01a:9d26:0:8aae:ddff:fe0a:99d8]:6443

and if it does work try the service address

curl -k -v https://[fdc9:b01a:cafe:60::1]:443
cerebrate commented 7 months ago

Connecting to the API server at the advertised address works fine; from the service address does not.

aojea commented 7 months ago

hmm, one thing is weird

I0309 23:20:38.220948 1 server.go:652] "kube-proxy running in dual-stack mode" primary ipFamily="IPv4"

but it seems you get your ip6tables rules correctly

-A KUBE-SERVICES -d fdc9:b01a:cafe:60::1/128 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y

goes to

-A KUBE-SVC-NPX46M4PTMTKRN6Y ! -s fdc9:b01a:cafe::/56 -d fdc9:b01a:cafe:60::1/128 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ -A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https -> [fdc9:b01a:9d26:0:8aae:ddff:fe0a:99d8]:6443"

and to

-A KUBE-SEP-ZW3YEZJQTUKK7ANJ -s fdc9:b01a:9d26:0:8aae:ddff:fe0a:99d8/128 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ -A KUBE-SEP-ZW3YEZJQTUKK7ANJ -p tcp -m comment --comment "default/kubernetes:https" -m tcp -j DNAT --to-destination [fdc9:b01a:9d26:0:8aae:ddff:fe0a:99d8]:6443

so it should redirect the traffic

do you have ipv6 forwarding enabled?

sysctl -w net.ipv6.conf.all.forwarding=1

we run IPv6 only CI using kubecadm with kind and it is working https://testgrid.k8s.io/conformance-kind#kind%20(IPv6),%20master%20(dev)

cerebrate commented 7 months ago

I do:

cluster@princess-celestia:~$ cat /etc/sysctl.d/kubernetes.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1

cluster@princess-celestia:~$ cat /proc/sys/net/ipv6/conf/all/forwarding
1

(same results on all nodes). If the iptables are doing the right thing (I must confess I'm not as up on iptables as I ought to be.) then - well, it's a puzzler to me.

I've had clusters working in the past (k8s 1.27, earlier versions of Debian bookworm) dual-stack with IPv6 primary myself, too, which only makes it more confusing to me. Not like I've suddenly changed my setup procedure, just updated the versions of the software involved.

aojea commented 7 months ago

ok, let's start over, can you do paste the the versions of the components involved and images that has changed in a working cluster and a failing one?

danwinship commented 7 months ago

Without having looked at this in much detail: the fact that there are weave-related rules in the ipv4 dump but not in the ipv6 dump seems suspicious. Is it possible you configured kube-proxy for dual-stack but configured your CNI plugin for single-stack?

danwinship commented 6 months ago

/close no reply... if this is still a problem please reopen and add more information

k8s-ci-robot commented 6 months ago

@danwinship: Closing this issue.

In response to [this](https://github.com/kubernetes/kubernetes/issues/123837#issuecomment-2050057292): >/close >no reply... if this is still a problem please reopen and add more information Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.