rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.53k stars 266 forks source link

[Cilium/ RKE2] cluster is losing network access #2389

Closed cwayne18 closed 1 year ago

cwayne18 commented 2 years ago

Issue description: The cluster is having network issue some time.

They end up with DNS pod failing to make any request.

the issue is the Kubernetes API is not reachable.

Business impact:

The cluster is unstable and barely unusable

Troubleshooting steps: Cluster nodes are able to talk to the Rancher.

All the nodes are in the same subnet, no firewall between them. Local node firewall is disabled aswell. The nodes can reach port 6443 and 9345.

iptables is not installed on the nodes.

They do not use kube-proxy in IPVS mode.

vadorovsky commented 2 years ago

I think that the problem is lack of iptables. We don't enable full kube-proxy replacement in Cilium by default, IPVS backend is not used either, so lack of iptables results in services not being available. Cilium logs are clearly showing that there is no connection to the k8s-apiserver service.

I would try to disable kube-proxy and enable full kube-proxy replacement in Cilium.

In order to do that, I would change /etc/rancher/rke2/config.yaml to disable kube-proxy:

cluster-cidr: 10.220.0.0/16
service-cidr: 10.221.0.0/16
cni: cilium
disable:
- rke2-kube-proxy
kube-apiserver-arg:
- anonymous-auth=true
kube-scheduler-arg:
- address=0.0.0.0
kube-controller-manager-arg:
- address=0.0.0.0
node-label:
- "cluster=mgt"
selinux: false
server: https://rke-mgt-01.css.ch:9345
system-default-registry: artifactory.css.ch
token: AzELaz7f2ny7pm4CfwbT8tWEVAK7T1XXXXXXXXXXXXXXXXXXXOSUMw00QaYP7kX9X1BtwH
tls-san:
- rke-mgt-01.css.ch
- rke-mgt-api.css.ch
profile: cis-1.6
audit-policy-file: /etc/rancher/rke2/audit-policy.yaml

on the first manager node:

cluster-cidr: 10.220.0.0/16
service-cidr: 10.221.0.0/16
cni: cilium
disable:
- rke2-kube-proxy
kube-apiserver-arg:
- anonymous-auth=true
kube-scheduler-arg:
- address=0.0.0.0
kube-controller-manager-arg:
- address=0.0.0.0
node-label:
- "cluster=mgt"
selinux: false
system-default-registry: artifactory.css.ch
token: AzELaz7f2ny7pm4CfwbT8tWEVAK7T1LnBZKHyXXXXXXXXXXXSUMw00QaYP7kX9X1BtwH
tls-san:
- rke-mgt-01.css.ch
- rke-mgt-api.css.ch
profile: cis-1.6

And then modify the Cilium config to enable kube-proxy replacement:

rkeConfig:
    chartValues:
      rke2-cilium:
        cilium:
          hubble:
            metrics:
              enabled:
              - dns:query;ignoreAAAA
              - drop
              - tcp
              - flow
              - icmp
              - http
            relay:
              enabled: true
              image:
                repository: cilium/hubble-relay
                tag: v1.10.4
            ui:
              backend:
                image:
                  repository: cilium/hubble-ui-backend
                  tag: v0.8.0
              enabled: true
              frontend:
                image:
                  repository: cilium/hubble-ui
                  tag: v0.8.0
              ingress:
                annotations: {}
                enabled: true
                hosts:
                - hubble-dev.css.ch
                tls:
                - hosts:
                  - hubble-dev.css.ch
                  secretName: tls-certificates-dev-hubble
              proxy:
                image:
                  repository: envoyproxy/envoy
              replicas: 1
          image:
            repository: rancher/mirrored-cilium-cilium
            tag: v1.10.4
          nodeinit:
            image:
              repository: rancher/mirrored-cilium-startup-script
              tag: 62bfbe88c17778aad7bef9fa57ff9e2d4a9ba0d8
          operator:
            image:
              repository: rancher/mirrored-cilium-operator
              tag: v1.10.4
          preflight:
            image:
              repository: rancher/mirrored-cilium-cilium
              tag: v1.10.4
          kubeProxyReplacement: "strict"
          k8sServiceHost: 10.150.85.45
          k8sServicePort: 6443

Please replace the value of k8sServiceHost with the IP addresss of your control-plane. It's the best if a load balancer is used, but if there is no load balancer, I would just use the address of the first control-plane node.

brandond commented 2 years ago

I think that the problem is lack of iptables. I would try to disable kube-proxy and enable full kube-proxy replacement in Cilium.

Would installing iptables on the nodes also resolve the problem? I would have expected kubelet and kube-proxy to use the iptables that's bundled in the image.

PhilipSchmid commented 2 years ago

Hi there,

Any update on this one? I experience the same issue although I have iptables installed:

$ rpm -qa | grep iptables
iptables-ebtables-1.8.4-20.el8.x86_64
iptables-1.8.4-20.el8.x86_64
iptables-libs-1.8.4-20.el8.x86_64

The Cilium DS agents show the following error:

level=info msg="Auto-disabling \"enable-bpf-clock-probe\" feature since KERNEL_HZ cannot be determined" error="Cannot probe CONFIG_HZ" subsys=daemon
level=info msg="Using autogenerated IPv4 allocation range" subsys=node v4Prefix=10.83.0.0/16
level=info msg="Initializing daemon" subsys=daemon
level=info msg="Establishing connection to apiserver" host="https://100.68.0.1:443" subsys=k8s
level=info msg="Establishing connection to apiserver" host="https://100.68.0.1:443" subsys=k8s
level=error msg="Unable to contact k8s api-server" error="Get \"https://100.68.0.1:443/api/v1/namespaces/kube-system\": dial tcp 100.68.0.1:443: i/o timeout" ipAddr="https://100.68.0.1:443" subsys=k8s
level=fatal msg="Unable to initialize Kubernetes subsystem" error="unable to create k8s client: unable to create k8s client: Get \"https://100.68.0.1:443/api/v1/namespaces/kube-system\": dial tcp 100.68.0.1:443: i/o timeout" subsys=daemon

Here my configurations (the relevant parts of it):

/var/lib/rancher/rke2/server/manifests/rke2-cilium-config.yaml:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-cilium
  namespace: kube-system
spec:
  valuesContent: |-
    cilium:
      kubeProxyReplacement: "strict"
      k8sServiceHost: rancher.k8s.example.com
      k8sServicePort: 6443

      ipam:
        operator:
          clusterPoolIPv4PodCIDRList:
          - "100.64.0.0/14"

/etc/rancher/rke2/config.yaml:

cluster-cidr: "100.64.0.0/14"
service-cidr: "100.68.0.0/16"
cluster-dns: "100.68.0.10"
selinux: "true"
cni: "cilium"
disable-kube-proxy: "true"
disable:
  - rke2-ingress-nginx

Thanks & regards, Philip

Edit: I just realized the helm-install-rke2-cilium job does not seem to update the kube-system/cilium-config CM properly. kube-proxy-replacement is still set to disabled...

brandond commented 2 years ago

As of the most recent round of releases, the chart values should no longer be nested under a cilium key.

  valuesContent: |-
    kubeProxyReplacement: "strict"
    k8sServiceHost: rancher.k8s.example.com
    k8sServicePort: 6443
    ipam:
      operator:
        clusterPoolIPv4PodCIDRList:
        - "100.64.0.0/14"
PhilipSchmid commented 2 years ago

Nice one 😅 ... That actually resolved my issue. The cilium-config CM now also has the proper flags set:

  kube-proxy-replacement: strict
  kube-proxy-replacement-healthz-bind-address: ""

BTW, I'm running RKE2 v1.22.8+rke2r1.

Thanks, @brandond!

Regards, Philip

revog commented 1 year ago

I stumpled by chance over this issue and it is very known to me - especially the config snippets :-). It seems that it has been created by a Rancher/SUSE employee during our Migration journey from Suse CaaS to Rancher. Fortunately root cause could found and fixed (it was a missmatch in the Linux Netstack IPv4/Ip4 configs).

So feel free to close this issue since it no longer has relevance!