rancher / rancher

Complete container management platform
http://rancher.com
Apache License 2.0
23.27k stars 2.95k forks source link

[BUG] helm-operation failure - Waiting for Kubernetes API to be available #41296

Open timothystewart6 opened 1 year ago

timothystewart6 commented 1 year ago

Rancher Server Setup

Information about the Cluster v1.25.6

MSandro commented 1 year ago

same here, I've updated to Rancher 2.7.3 yesterday. Rancher Cluster: k3s v1.24.9+k3s2 Downstream Cluster: rke2 v1.25.9+rke2r1 I am not able to install / update a helm app with rancher.

gaoliuzhu commented 1 year ago

I have the same problem,my rancher version v2.7.4

image
schipperkp commented 1 year ago

Same

schipperkp commented 1 year ago

Linked issue https://github.com/rancher/rancher/issues/41127

gaoliuzhu commented 1 year ago

I have the same problem,my rancher version v2.7.4 image

我撤回自己的问题,我最终发现是我这里的集群确实网络有问题,没办法和k8s的api server通讯,rancher是暂时没有bug的

MKlimuszka commented 1 year ago

For all of you who reported this, does the upgrade work eventually? I agree that this definitely looks bad, but it's really just a look behind the curtain while these pods are waiting for other things to to spin up, and they should eventually resolve. Any additional info is appreciated.

WiltonFerreira commented 1 year ago

same here

gaoliuzhu commented 1 year ago

For all of you who reported this, does the upgrade work eventually? I agree that this definitely looks bad, but it's really just a look behind the curtain while these pods are waiting for other things to to spin up, and they should eventually resolve. Any additional info is appreciated.

my upgrade is ok, before, it was the problem of my cluster itself. the api server of k8s was unable to communicate due to my network partitioning problem, so the above error was reported, but it was actually my own problem.

gaoliuzhu commented 1 year ago

for those of you experiencing this problem, check that your cluster's k8s API is communicating properly.

gaoliuzhu commented 1 year ago

for those of you experiencing this problem, check that your cluster's k8s API is communicating properly.

Alexander-Chiang commented 1 year ago

same

ca1123 commented 1 year ago

For all of you who reported this, does the upgrade work eventually? I agree that this definitely looks bad, but it's really just a look behind the curtain while these pods are waiting for other things to to spin up, and they should eventually resolve. Any additional info is appreciated.

my upgrade is ok, before, it was the problem of my cluster itself. the api server of k8s was unable to communicate due to my network partitioning problem, so the above error was reported, but it was actually my own problem.

What do you mean exactly by "network partitioning problem"? I do know that the proxy container in the same pod cause the problem, but not sure how it supposed to work.

smolinari commented 1 year ago

for those of you experiencing this problem, check that your cluster's k8s API is communicating properly.

How would you do this? I've ran into this scenario too now and have no idea how to troubleshoot it.

Scott

ca1123 commented 1 year ago

for those of you experiencing this problem, check that your cluster's k8s API is communicating properly.

How would you do this? I've ran into this scenario too now and have no idea how to troubleshoot it.

Scott

Rancher is good, but it is a leaky abstraction. The assumption is you work with standard machines from cloud providers or enterprise procurement. Mine turns out to be within tigera calico operator that Rancher has made mostly opaque. https://docs.tigera.io/calico/latest/networking/ipam/ip-autodetection I have two NICs in some machines where a secondary NIC has a local storage network in a closed different subnet. Unfortunately, the default autodetection method of tigera calico operator is whichever NIC first seen, and those secondary NICs are what it saw. Hence some links are good while others are not, depending on where the node is located. This has nothing to do with what Matt says:

For all of you who reported this, does the upgrade work eventually? I agree that this definitely looks bad, but it's really just a look behind the curtain while these pods are waiting for other things to to spin up, and they should eventually resolve. Any additional info is appreciated.

Or it has everything to do with what Rancher tries to achieve, to provide a mostly default automation K8s setup that works.

Another resource that might help is nicolaka/netshoot diagonisation tool, where you can make a deamon set to test connections and routes. Try dig-ging kubernetes.default in the container to test coreDNS, or ping-ing across locations to test pod vxlan and service ips.

dmitrijrub commented 1 year ago

having same issue with rancher 2.7.6, when deploying RKE2 server with CIS profile enabled

maradwan commented 1 year ago

Same issue rancher 2.7.5

gudge25 commented 1 year ago

same 2.7.6

image
epelaic commented 12 months ago

Hello, New to Rancher and starting with rke2 v1.25.13+rke2r1 + rancher 2.7.6, I had similar problems (Waiting for k8s api to available, dns issues, and others random issues). It works on debian 10 and fail after upgrade on debian 11, Fail on RHEL 8.8 and 9.2. Afters many days on struggling with that issue, I found a workaround on RHEL 9.2 (not yet tested on RHEL 8.8 nor Debian 11). I juste set the CNI to calico instead of the default canal and it seems to works. On RHEL 9.2 with canal, I must set the dnsPolicy to Default to enroll to the managment rke2 cluster. With calico, nos DNS issue to enroll to the cluster. Successfully deployed Cluster Monitor tools, not tested with longhorn on that cluster. I suspect a problem with NetworkManager and core-dns, I did like it's writen in the doc for exclude cali & flannel managed interface on RHEL distrib. I upgraded the cluster to latest v1.26.9+rke2r1 and no visibles regressions.

Hope it can help.

epelaic commented 12 months ago

Hello, The cluster RHEL 9.2 has selinux disabled and rke2-selinux set to false. I re-created the RHEL 9.2 from scratch starting with rke2 v1.26.9+rke2r1 rancher 2.7.6 with selinux in permissive mode and rke2 default selinux config. Still works. The only things it's missing is the "consumed resources" value for metrics in the dashbord (on top of events datatable), The graphana metrics below works. On the nodes détails UI, the consumed resources works. I re-created a Debian 11 cluster but with calico, and it's works. I do not have time to test on RHEL 8.8 now.

I don't understand why the network interfaces managed by canal has the "cali*" prefix like with calico?

Best regards.

nicholasSUSE commented 11 months ago

Hello, I am not able to reproduce the bug. Could some of you provide me with the following information?

I found another person with the same problem at this link: https://superuser.com/questions/1793772/new-rancher-installation-waiting-for-kubernetes-api-to-be-available

In this case, the person provided all commands, I executed all of them but it still works fine in my machine.

Any extra information about your environment is welcome.

epelaic commented 11 months ago

Hello,

For my part, I'am training on both fresh install and upgrade (rke v1.25.13+rke2r1 and v1.26.9+rke2r1 and rancher 2.7.6 only). I'am working on a private cloud for test purpose only (no production workload). On this private cloud there is a CaaS service on top of Rancher RKE1 (v1.25.9 and rancher 2.7.2 + calico) and it works perfectly. All of this private cloud is powerd by a vmware stack. The virtual machines that I provisionned, are destroyed and recreated many times I want to do my tests, but the IP address may not be on the same vlan most of the time (/24). All the virtual machines are in the same network "dev" tiers and there is no firewall rules beetwen this machines. There is a proxy + white list filtering for outgoing traffic (to go outside of the VM infrastucture). Each node as 2 VCPU and 8Go of Ram, à dedicated partition rancher and second for longhorn.

For each rke2 cluster I have an basic haproxy in tcp mode for ports 9345, 6443 and 443 depending on the nodes roles. The rancher certificat is signed by a custom "self signed" CA.

The first step is to prepare the VM (via ansible custom playbook) :

on both master and slave I have proxy settings : /etc/defautl/rke2-agent /etc/defautl/rke2-server /etc/profile.d/proxy.sh

HTTP_PROXY=http://proxy.server.com:3128 HTTPS_PROXY=http://proxy.server.com:3128 NO_PROXY=127.0.0.0/8,10.0.0.0/8,192.168.1.0/16,*.in.server.com,localhost,.svc,.cluster.local

primary node : /etc/rancher/rke2/config.yaml

token: 2b6a46bd-af1d-49e7-a6ad-c248312a14e5
tls-san:
- rancher-dev-1.server.com
- our-dev-rch0032.server.com
- our-dev-rch0033.server.com
- our-dev-rch0034.server.com
kube-apiserver-arg: "kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname"

none primary master : /etc/rancher/rke2/config.yaml

server: https://rancher-dev-1.server.com:9345
token: xxxxxxxxxxxxxxxx
tls-san:
- rancher-dev-1.server.com
- our-dev-rch0032.server.com
- our-dev-rch0033.server.com
- our-dev-rch0034.server.com
kube-apiserver-arg: "kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname"

agent : /etc/rancher/rke2/config.yaml

server: https://rancher-dev-1.server.com:9345
token: xxxxxxxxxxxxxxxx
kube-apiserver-arg: "kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname"

I just add cni: calico in config.yaml for "Debian 11, RHEL 9.2" to works.

On RHEL, networkmanager patch /etc/NetworkManager/conf.d/rke2-canal.conf [keyfile] unmanaged-devices=interface-name:cali*;interface-name:flannel* Add the CA on the nodes Debian : copy CA file in /usr/local/share/ca-certificates/ and then exec update-ca-certificates RHEL : copy CA file in /etc/pki/ca-trust/source/anchors/ and then exec update-ca-trust

Disabling firewalld, on debian ufw is not active by default on our dev images.

step 1 : rke2 install command :

- name: "Installation RKE2"
      shell: |
        curl -sfL https://get.rke2.io | \
          INSTALL_RKE2_VERSION={{ rke2.version |default('') }} \
          INSTALL_RKE2_CHANNEL={{ rke2.channel |default('') }} \
          INSTALL_RKE2_TYPE="{{ hostvars[inventory_hostname].node_type }}" \
          sh -
      environment:
        http_proxy: "{{ http_proxy }}"
        https_proxy: "{{ http_proxy }}"

step 2 : template config.yaml step 3 : start rke2 server or agent

    - name: "Activation et démarrage de rke2-server"
      systemd:
        name: rke2-server
        enabled: true
        state: restarted
      when: hostvars[inventory_hostname].node_type == 'server'
    - name: "Activation et démarrage de rke2-agent"
      systemd:
        name: rke2-agent
        enabled: true
        state: restarted
      when: hostvars[inventory_hostname].node_type == 'agent'

Rancher install (on the primary master):

step 1: kubectl create namespace cattle-system step 2 : copy certs and CA on the node step 3 : kubectl add secret tks certs & keys ingress

    - name: 'Kubectl add secret tls certs & keys ingress'
      shell: |
        KUBECONFIG=/etc/rancher/rke2/rke2.yaml /var/lib/rancher/rke2/bin/kubectl \
          -n cattle-system create secret tls tls-rancher-ingress \
          --cert=/root/rancher/certs/{{ rancher_gui.tls_ca.certs.cert }} \
          --key=/root/rancher/certs/{{ rancher_gui.tls_ca.certs.key }}
      register: kubectl_create_secret_tls_rancher_ingress
      tags: configure

step 4 : kubectl add secret cacerts

- name: 'Kubectl add secret cacerts'
      shell: |
        KUBECONFIG=/etc/rancher/rke2/rke2.yaml /var/lib/rancher/rke2/bin/kubectl \
          -n cattle-system create secret generic tls-ca \
          --from-file=/root/rancher/certs/{{ rancher_gui.tls_ca.certs.cacert }}
      register: kubectl_create_secret_tls_cacert
      tags: configure

step 5 : Helm Install rancher repo

- name: 'Helm install Rancher repo'
      shell: |
        KUBECONFIG=/etc/rancher/rke2/rke2.yaml PATH=$PATH:/var/lib/rancher/rke2/bin/kubectl \
          /usr/local/bin/helm repo add rancher-{{ rancher_gui.repo }} https://releases.rancher.com/server-charts/{{ rancher_gui.repo }}
      register: helm_install_rancher_repo
      environment:
        http_proxy: "{{ http_proxy }}"
        https_proxy: "{{ http_proxy }}"
      tags: install

step 6 : Helm install Rancher GUI

    - name: 'Helm install Rancher GUI'
      shell: |
        KUBECONFIG=/etc/rancher/rke2/rke2.yaml PATH=$PATH:/var/lib/rancher/rke2/bin/kubectl \
          /usr/local/bin/helm  \
          upgrade --install rancher rancher-{{ rancher_gui.repo }}/rancher \ 
            {{ rancher_gui.devel }} \
            --namespace cattle-system \
            --version {{ rancher_gui.version }} \
            --set hostname={{ rancher_gui.hostname }} \
            --set bootstrapPassword={{ rancher_gui.bootstrapPassword }} \
            --set ingress.tls.source=secret \
            --set privateCA={{ rancher_gui.tls_ca.privateCA }} \
            --debug
      register: helm_install_rancher
      environment:
        http_proxy: "{{ http_proxy }}"
        https_proxy: "{{ http_proxy }}"
      tags: install

then waiting for rancher finish :

    - name: Helm check install rancher gui
      shell: KUBECONFIG=/etc/rancher/rke2/rke2.yaml /var/lib/rancher/rke2/bin/kubectl -n cattle-system rollout status deploy/rancher
      register: kubectl_check_rancher_install
      environment:
        http_proxy: "{{ http_proxy }}"
        https_proxy: "{{ http_proxy }}"
      tags: install

I noticed that I have on others nodes when joining the cluster the "NodePasswordValidationFailed" (not on the primary node) with canal. And the workarround was to do this : kubectl delete -n cattle-system MutatingWebhookConfiguration rancher.cattle.io. No probleme with calico.

Do you need more informations, or do want me to do some tests ?

f14stelt commented 11 months ago

Had the exact same issue with Rancher 2.7.6.

What i found out as correctly reported by @ca1123 the issue was related to having 2 NICs per VM, probably tigera (or helm when it tries to deploy an app) try to connect to kubernetes api passing through the wrong nic.

What i did to fix it was to remove the secondary NIC and adding it back, i still didnt tested a full node reboot to check if everything keeps working, but if so i think that to avoid this issue we all must add others NICs only after the cluster get's deployed.

--- UPDATE [ 2023-10-23 ] --- As previously said i executed a reboot of all my worker nodes (the only nodes that have a second NIC with an IP assigned on a DMZ net) and again the same issue. Unfortunately, i get still: "Waiting for Kubernetes API to be available"

To check this i tried to deploy an application from the Rancher APP CATALOGS.

PS: refixed this by removing and adding the nics, any suggestion regarding a workaround would be apreciated

--- UPDATE [ 2023-10-24 ] --- After many researches and a deep debugging i found a solution that works to avoid this type of double nic issue, it requires planning before the RKE2 cluster deployment.

First of all plan your Kubernetes cluster (RKE2) accordingly to your needs and check carefully how many nics you'll need and which specifically serves calico to correctly communicate with the KubeAPI, as example i'll show my use-case.

In my specific use-case i needed my kubernetes node to have 2 different NICs:

During the Rancher managed cluster setup we must change the configuration inside "Add-On Config" accordingly to our network schema, here an example: image By default the parameters inside the red square will not be present, so we have to manually add that:

    nodeAddressAutodetectionV4:
      interface: ens3

After doing this our cluster will be deployed and will use the specified nic, as we can even see from the Tigera configuration: image So now it will not be set as standard to "first-found".

Unfortunately i still didn't found any way to change that in case the cluster is already deployed.

Feel free to test it our with your unique use case, hope to have helped some of you :)

epelaic commented 11 months ago

Hello, for my part I have only one nic per node.

f14stelt commented 11 months ago

@epelaic have you checked that you dont have any issue in connecting to kubernetes api from your master and worker nodes? Seems that usually this error is related to bad network communication.

epelaic commented 11 months ago

@f14stelt do you have a procedure to validate the communication between master ans worker nodes ? But if it is a network issue, it should affect both cni canal or calico ?

diogoasouza commented 11 months ago

Hi, I did manage to reproduce this issue following @f14stelt instructions, but it looks like this is related to bad network communication and not necessarily Rancher itself.

diogoasouza commented 10 months ago

@f14stelt do you have a procedure to validate the communication between master ans worker nodes ? But if it is a network issue, it should affect both cni canal or calico ?

You can use the tool that @ca1123 suggested, nicolaka/netshoot. Did you notice if the pods that failed were always scheduled on the same node?

epelaic commented 10 months ago

@diogoasouza I will try next week, I need to reinstall a cluster with canal.

epelaic commented 10 months ago

Hello, I have rebuilt a cluster RHEL 9 with canal. I upgraded the hosts VM to RHEL 9.2 to 9.3 (kernel 5.14.0-362.8.1.el9_3.x86_64 & selinux in permissive mode, one NIC per host, an haproxy in tcp mode for ports 443, 9345 and 6443).

I have deployed the last stable RKE2 v1.26.10+rke2r2.

[root@our-dev-k8s0019 ~]# ifconfig 
cali83858136281: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 7  bytes 420 (420.0 B)
        TX errors 0  dropped 5 overruns 0  carrier 0  collisions 0

cali0de3ef31751: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 7  bytes 420 (420.0 B)
        TX errors 0  dropped 5 overruns 0  carrier 0  collisions 0

cali262069506ee: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 7  bytes 420 (420.0 B)
        TX errors 0  dropped 5 overruns 0  carrier 0  collisions 0

cali2bcd1788e3c: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 7  bytes 420 (420.0 B)
        TX errors 0  dropped 5 overruns 0  carrier 0  collisions 0

cali84bef65ee54: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 7  bytes 420 (420.0 B)
        TX errors 0  dropped 5 overruns 0  carrier 0  collisions 0

calidc4e0f2df34: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 7  bytes 420 (420.0 B)
        TX errors 0  dropped 5 overruns 0  carrier 0  collisions 0

calif2b2a29df88: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 7  bytes 420 (420.0 B)
        TX errors 0  dropped 5 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.xxx.xxx.131  netmask 255.255.255.0  broadcast 10.xxx.xxx.255
        inet6 fe80::250:56ff:febe:80cb  prefixlen 64  scopeid 0x20<link>
        ether 00:50:56:be:80:cb  txqueuelen 1000  (Ethernet)
        RX packets 2344701  bytes 1972803873 (1.8 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1131005  bytes 230990847 (220.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.42.0.0  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::9073:5dff:fed7:c2c5  prefixlen 64  scopeid 0x20<link>
        ether 92:73:5d:d7:c2:c5  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 7  bytes 420 (420.0 B)
        TX errors 0  dropped 5 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 1554682  bytes 700551782 (668.0 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1554682  bytes 700551782 (668.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
[root@our-dev-k8s0019 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.xxx.xxx.254  0.0.0.0         UG    100    0        0 eth0
10.42.0.2       0.0.0.0         255.255.255.255 UH    0      0        0 cali262069506ee
10.42.0.3       0.0.0.0         255.255.255.255 UH    0      0        0 cali83858136281
10.42.0.9       0.0.0.0         255.255.255.255 UH    0      0        0 cali84bef65ee54
10.42.0.10      0.0.0.0         255.255.255.255 UH    0      0        0 cali2bcd1788e3c
10.42.0.11      0.0.0.0         255.255.255.255 UH    0      0        0 calif2b2a29df88
10.42.0.13      0.0.0.0         255.255.255.255 UH    0      0        0 calidc4e0f2df34
10.42.0.15      0.0.0.0         255.255.255.255 UH    0      0        0 cali0de3ef31751
10.42.1.0       10.42.1.0       255.255.255.0   UG    0      0        0 flannel.1
10.42.2.0       10.42.2.0       255.255.255.0   UG    0      0        0 flannel.1
10.xxx.xxx.0    0.0.0.0         255.255.255.0   U     100    0        0 eth0
[root@our-dev-k8s0020 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.xxx.xxx.254  0.0.0.0         UG    100    0        0 eth0
10.42.0.0       10.42.0.0       255.255.255.0   UG    0      0        0 flannel.1
10.42.1.2       0.0.0.0         255.255.255.255 UH    0      0        0 caliaf6319faae0
10.42.1.3       0.0.0.0         255.255.255.255 UH    0      0        0 calibe8f1a74102
10.42.1.6       0.0.0.0         255.255.255.255 UH    0      0        0 cali241f74f9506
10.42.2.0       10.42.2.0       255.255.255.0   UG    0      0        0 flannel.1
10.xxx.xxx.0    0.0.0.0         255.255.255.0   U     100    0        0 eth0
[root@our-dev-k8s0021 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.xxx.xxx.254  0.0.0.0         UG    100    0        0 eth0
10.42.0.0       10.42.0.0       255.255.255.0   UG    0      0        0 flannel.1
10.42.1.0       10.42.1.0       255.255.255.0   UG    0      0        0 flannel.1
10.42.2.2       0.0.0.0         255.255.255.255 UH    0      0        0 calieffd01500ab
10.42.2.4       0.0.0.0         255.255.255.255 UH    0      0        0 cali4d8e5a1a911
10.42.2.7       0.0.0.0         255.255.255.255 UH    0      0        0 cali99dfa9ba475
10.xxx.xxx.0    0.0.0.0         255.255.255.0   U     100    0        0 eth0

I noticed that master-1 (primary server) has no route 10.42.0.0 10.42.0.0 255.255.255.0 UG 0 0 0 flannel.1 but master-2 & master-3 have.

Master-1 have routes :

10.42.1.0       10.42.1.0       255.255.255.0   UG    0      0        0 flannel.1
10.42.2.0       10.42.2.0       255.255.255.0   UG    0      0        0 flannel.1

Master-2 have routes :

10.42.0.0       10.42.0.0       255.255.255.0   UG    0      0        0 flannel.1
10.42.2.0       10.42.2.0       255.255.255.0   UG    0      0        0 flannel.1

And master-3 have routes :

10.42.0.0       10.42.0.0       255.255.255.0   UG    0      0        0 flannel.1
10.42.1.0       10.42.1.0       255.255.255.0   UG    0      0        0 flannel.1

Comparing with a calico cluster, both nodes have a common vxlan.calico 10.42.26.0 10.42.26.0 255.255.255.192 UG 0 0 0 vxlan.calico or 10.42.26.0 0.0.0.0 255.255.255.192 U 0 0 0 *

[root@our-dev-k8s0019 ~]# iptables -L
# Warning: iptables-legacy tables present, use iptables-legacy to see them
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
cali-INPUT  all  --  anywhere             anywhere             /* cali:Cz_u1IQiXIMmKD4c */
KUBE-PROXY-FIREWALL  all  --  anywhere             anywhere             ctstate NEW /* kubernetes load balancer firewall */
KUBE-NODEPORTS  all  --  anywhere             anywhere             /* kubernetes health check service ports */
KUBE-EXTERNAL-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes externally-visible service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere            

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
cali-FORWARD  all  --  anywhere             anywhere             /* cali:wUHhoiAYhphO9Mso */
KUBE-PROXY-FIREWALL  all  --  anywhere             anywhere             ctstate NEW /* kubernetes load balancer firewall */
KUBE-FORWARD  all  --  anywhere             anywhere             /* kubernetes forwarding rules */
KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
KUBE-EXTERNAL-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes externally-visible service portals */
FLANNEL-FWD  all  --  anywhere             anywhere             /* flanneld forward */
ACCEPT     all  --  anywhere             anywhere             /* cali:S93hcgKJrXEqnTfs */ /* Policy explicitly accepted packet. */ mark match 0x10000/0x10000
MARK       all  --  anywhere             anywhere             /* cali:mp77cMpurHhyjLrM */ MARK or 0x10000

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
cali-OUTPUT  all  --  anywhere             anywhere             /* cali:tVnHkvAo15HuiPy0 */
KUBE-PROXY-FIREWALL  all  --  anywhere             anywhere             ctstate NEW /* kubernetes load balancer firewall */
KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere            

Chain FLANNEL-FWD (1 references)
target     prot opt source               destination         
ACCEPT     all  --  our-dev-k8s0019.server.fr/16  anywhere             /* flanneld forward */
ACCEPT     all  --  anywhere             our-dev-k8s0019.server.fr/16  /* flanneld forward */

Chain KUBE-EXTERNAL-SERVICES (2 references)
target     prot opt source               destination         

Chain KUBE-FIREWALL (2 references)
target     prot opt source               destination         
DROP       all  -- !127.0.0.0/8          127.0.0.0/8          /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT
DROP       all  --  anywhere             anywhere             /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000

Chain KUBE-FORWARD (1 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere             ctstate INVALID
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding conntrack rule */ ctstate RELATED,ESTABLISHED

Chain KUBE-KUBELET-CANARY (0 references)
target     prot opt source               destination         

Chain KUBE-NODEPORTS (1 references)
target     prot opt source               destination         

Chain KUBE-PROXY-CANARY (0 references)
target     prot opt source               destination         

Chain KUBE-PROXY-FIREWALL (3 references)
target     prot opt source               destination         

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination         

Chain cali-FORWARD (1 references)
target     prot opt source               destination         
MARK       all  --  anywhere             anywhere             /* cali:vjrMJCRpqwy5oRoX */ MARK and 0xfff1ffff
cali-from-hep-forward  all  --  anywhere             anywhere             /* cali:A_sPAO0mcxbT9mOV */ mark match 0x0/0x10000
cali-from-wl-dispatch  all  --  anywhere             anywhere             /* cali:8ZoYfO5HKXWbB3pk */
cali-to-wl-dispatch  all  --  anywhere             anywhere             /* cali:jdEuaPBe14V2hutn */
cali-to-hep-forward  all  --  anywhere             anywhere             /* cali:12bc6HljsMKsmfr- */
cali-cidr-block  all  --  anywhere             anywhere             /* cali:NOSxoaGx8OIstr1z */

Chain cali-INPUT (1 references)
target     prot opt source               destination         
cali-wl-to-host  all  --  anywhere             anywhere            [goto]  /* cali:FewJpBykm9iJ-YNH */
ACCEPT     all  --  anywhere             anywhere             /* cali:hder3ARWznqqv8Va */ mark match 0x10000/0x10000
MARK       all  --  anywhere             anywhere             /* cali:xgOu2uJft6H9oDGF */ MARK and 0xfff0ffff
cali-from-host-endpoint  all  --  anywhere             anywhere             /* cali:_-d-qojMfHM6NwBo */
ACCEPT     all  --  anywhere             anywhere             /* cali:LqmE76MP94lZTGhA */ /* Host endpoint policy accepted packet. */ mark match 0x10000/0x10000

Chain cali-OUTPUT (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             /* cali:Mq1_rAdXXH3YkrzW */ mark match 0x10000/0x10000
RETURN     all  --  anywhere             anywhere             /* cali:69FkRTJDvD5Vu6Vl */
MARK       all  --  anywhere             anywhere             /* cali:Fskumj4SGQtDV6GC */ MARK and 0xfff0ffff
cali-to-host-endpoint  all  --  anywhere             anywhere             /* cali:1F4VWEsQu0QbRwKf */ ! ctstate DNAT
ACCEPT     all  --  anywhere             anywhere             /* cali:m8Eqm15x1MjD24LD */ /* Host endpoint policy accepted packet. */ mark match 0x10000/0x10000

Chain cali-cidr-block (1 references)
target     prot opt source               destination         

Chain cali-from-hep-forward (1 references)
target     prot opt source               destination         

Chain cali-from-host-endpoint (1 references)
target     prot opt source               destination         

Chain cali-from-wl-dispatch (2 references)
target     prot opt source               destination         
cali-fw-cali0de3ef31751  all  --  anywhere             anywhere            [goto]  /* cali:zwGHeHxKOGW6bwpF */
cali-from-wl-dispatch-2  all  --  anywhere             anywhere            [goto]  /* cali:_HX-BvZ8iBwnnTXj */
cali-from-wl-dispatch-8  all  --  anywhere             anywhere            [goto]  /* cali:VP30QcAwt1gJyTYe */
cali-fw-calidc4e0f2df34  all  --  anywhere             anywhere            [goto]  /* cali:tmy57ina-hdt_Sgh */
cali-fw-calif2b2a29df88  all  --  anywhere             anywhere            [goto]  /* cali:7tJICuqbj7NJTibo */
DROP       all  --  anywhere             anywhere             /* cali:_zJIo7nYRYOmclGf */ /* Unknown interface */

Chain cali-from-wl-dispatch-2 (1 references)
target     prot opt source               destination         
cali-fw-cali262069506ee  all  --  anywhere             anywhere            [goto]  /* cali:Hq29kzLfRuVPd1ST */
cali-fw-cali2bcd1788e3c  all  --  anywhere             anywhere            [goto]  /* cali:byiS-7XdISBuX_8L */
DROP       all  --  anywhere             anywhere             /* cali:EgU6CHYWNXn1ZXYp */ /* Unknown interface */

Chain cali-from-wl-dispatch-8 (1 references)
target     prot opt source               destination         
cali-fw-cali83858136281  all  --  anywhere             anywhere            [goto]  /* cali:WfsBP9SywLn0iLnH */
cali-fw-cali84bef65ee54  all  --  anywhere             anywhere            [goto]  /* cali:mZlOvFyF1EBnvh7o */
DROP       all  --  anywhere             anywhere             /* cali:sL-pEKql66lglrrC */ /* Unknown interface */

Chain cali-fw-cali0de3ef31751 (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             /* cali:xfb_ByP-bmsSdRwN */ ctstate RELATED,ESTABLISHED
DROP       all  --  anywhere             anywhere             /* cali:LfQEiX8DNwU2D1f1 */ ctstate INVALID
MARK       all  --  anywhere             anywhere             /* cali:mkZtwpaSo8op2oJh */ MARK and 0xfffeffff
DROP       udp  --  anywhere             anywhere             /* cali:AmZ3PGnMbpY_wwNA */ /* Drop VXLAN encapped packets originating in workloads */ multiport dports vxlan
DROP       ipv4 --  anywhere             anywhere             /* cali:6BdWSbZOWLq13a3o */ /* Drop IPinIP encapped packets originating in workloads */
cali-pro-kns.cattle-system  all  --  anywhere             anywhere             /* cali:86P-vVXMQsq9-URx */
RETURN     all  --  anywhere             anywhere             /* cali:-q4FaG0zwGRM1bow */ /* Return if profile accepted */ mark match 0x10000/0x10000
cali-pro-_XnQ5h_hZf854SLqzqE  all  --  anywhere             anywhere             /* cali:x5jUy2fRx4hmCqze */
RETURN     all  --  anywhere             anywhere             /* cali:Ex8RGsK5ep8qTXSX */ /* Return if profile accepted */ mark match 0x10000/0x10000
DROP       all  --  anywhere             anywhere             /* cali:D-U7zXnzntzBoq6e */ /* Drop if no profiles matched */

Chain cali-fw-cali262069506ee (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             /* cali:AkTpbM1qNfsMRsJc */ ctstate RELATED,ESTABLISHED
DROP       all  --  anywhere             anywhere             /* cali:-LFIhKIMJVIrVVlg */ ctstate INVALID
MARK       all  --  anywhere             anywhere             /* cali:76Rh9cJGiyjrLEBS */ MARK and 0xfffeffff
DROP       udp  --  anywhere             anywhere             /* cali:FiIoFf5hvsYp5wRU */ /* Drop VXLAN encapped packets originating in workloads */ multiport dports vxlan
DROP       ipv4 --  anywhere             anywhere             /* cali:Mwyt8lT4Ui9qAwsY */ /* Drop IPinIP encapped packets originating in workloads */
cali-pro-kns.kube-system  all  --  anywhere             anywhere             /* cali:y1R1ICyt1NxxO-mt */
RETURN     all  --  anywhere             anywhere             /* cali:0SseCjAoVk3I8tPw */ /* Return if profile accepted */ mark match 0x10000/0x10000
cali-pro-_7bnNHSm00P51QAo5Qe  all  --  anywhere             anywhere             /* cali:oSgcYjUn9whBYoel */
RETURN     all  --  anywhere             anywhere             /* cali:s627Zz-fyxNz55Kd */ /* Return if profile accepted */ mark match 0x10000/0x10000
DROP       all  --  anywhere             anywhere             /* cali:X4FH3A_GKRN5jVsx */ /* Drop if no profiles matched */

Chain cali-fw-cali2bcd1788e3c (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             /* cali:_mQtHsfD_x9QEUUq */ ctstate RELATED,ESTABLISHED
DROP       all  --  anywhere             anywhere             /* cali:kFH_ZGpBBST51Qmq */ ctstate INVALID
MARK       all  --  anywhere             anywhere             /* cali:Grb611q6cS03ZWbL */ MARK and 0xfffeffff
DROP       udp  --  anywhere             anywhere             /* cali:S8wGWfPZ-9CfKtye */ /* Drop VXLAN encapped packets originating in workloads */ multiport dports vxlan
DROP       ipv4 --  anywhere             anywhere             /* cali:y8CMEJt2_BgQ0ETO */ /* Drop IPinIP encapped packets originating in workloads */
cali-pro-kns.kube-system  all  --  anywhere             anywhere             /* cali:EgYwmm_ONnVoVVCc */
RETURN     all  --  anywhere             anywhere             /* cali:zZ1-1HJjp5O0oysl */ /* Return if profile accepted */ mark match 0x10000/0x10000
cali-pro-_8SDYViIwwzQDgRml2t  all  --  anywhere             anywhere             /* cali:HpQ-0RkyuyfmVOj9 */
RETURN     all  --  anywhere             anywhere             /* cali:68rfY11TIqvdWyc7 */ /* Return if profile accepted */ mark match 0x10000/0x10000
DROP       all  --  anywhere             anywhere             /* cali:CEh0yw_Uai9sK9Uw */ /* Drop if no profiles matched */

Chain cali-fw-cali83858136281 (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             /* cali:yY_K4yYhHkXBJMFz */ ctstate RELATED,ESTABLISHED
DROP       all  --  anywhere             anywhere             /* cali:RsX47bWKEZzJfey2 */ ctstate INVALID
MARK       all  --  anywhere             anywhere             /* cali:fvgi1X2WOapSX-Ug */ MARK and 0xfffeffff
DROP       udp  --  anywhere             anywhere             /* cali:9vIgRhw55ZNNFhKZ */ /* Drop VXLAN encapped packets originating in workloads */ multiport dports vxlan
DROP       ipv4 --  anywhere             anywhere             /* cali:2sTbxEQDEco8giHp */ /* Drop IPinIP encapped packets originating in workloads */
cali-pro-kns.kube-system  all  --  anywhere             anywhere             /* cali:pMqgCFh1iP3jMyiE */
RETURN     all  --  anywhere             anywhere             /* cali:P1_WGF_TZiAaJMJf */ /* Return if profile accepted */ mark match 0x10000/0x10000
cali-pro-_u2Tn2rSoAPffvE7JO6  all  --  anywhere             anywhere             /* cali:8Rq2ZPH_c5oa1M7G */
RETURN     all  --  anywhere             anywhere             /* cali:XMSMH0j0_65XNcXz */ /* Return if profile accepted */ mark match 0x10000/0x10000
DROP       all  --  anywhere             anywhere             /* cali:uIUUC4mgEfvXQxFD */ /* Drop if no profiles matched */

Chain cali-fw-cali84bef65ee54 (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             /* cali:M4lVDsA6ic-nfi6F */ ctstate RELATED,ESTABLISHED
DROP       all  --  anywhere             anywhere             /* cali:O0vaLEolEOoXquvS */ ctstate INVALID
MARK       all  --  anywhere             anywhere             /* cali:y6MoUnVa_BXvEQfS */ MARK and 0xfffeffff
DROP       udp  --  anywhere             anywhere             /* cali:pORB5zzkjoWaaSY6 */ /* Drop VXLAN encapped packets originating in workloads */ multiport dports vxlan
DROP       ipv4 --  anywhere             anywhere             /* cali:E69qbK9msEzZzu4Y */ /* Drop IPinIP encapped packets originating in workloads */
cali-pro-kns.kube-system  all  --  anywhere             anywhere             /* cali:4H2lRziTcRpTXcYC */
RETURN     all  --  anywhere             anywhere             /* cali:IONiVqiFouEArH_J */ /* Return if profile accepted */ mark match 0x10000/0x10000
cali-pro-_vOEu3o_UBpjhIsR6zZ  all  --  anywhere             anywhere             /* cali:sjSWhazu-3Xr5gE7 */
RETURN     all  --  anywhere             anywhere             /* cali:DGFOhde-CNDwARjO */ /* Return if profile accepted */ mark match 0x10000/0x10000
DROP       all  --  anywhere             anywhere             /* cali:rqudzm5hdh7adexN */ /* Drop if no profiles matched */

Chain cali-fw-calidc4e0f2df34 (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             /* cali:ibAMynRwKaAEz5to */ ctstate RELATED,ESTABLISHED
DROP       all  --  anywhere             anywhere             /* cali:L0RK6-8DsBkhrzLP */ ctstate INVALID
MARK       all  --  anywhere             anywhere             /* cali:Bwe8MAwEb7Se6mqq */ MARK and 0xfffeffff
DROP       udp  --  anywhere             anywhere             /* cali:Hm-GlC2giF5dKN9w */ /* Drop VXLAN encapped packets originating in workloads */ multiport dports vxlan
DROP       ipv4 --  anywhere             anywhere             /* cali:KRy8fMWSicpSJuB2 */ /* Drop IPinIP encapped packets originating in workloads */
cali-pro-kns.kube-system  all  --  anywhere             anywhere             /* cali:YOqtrF-H9tlyhuE2 */
RETURN     all  --  anywhere             anywhere             /* cali:hCmQh9K2yn0kH5LZ */ /* Return if profile accepted */ mark match 0x10000/0x10000
cali-pro-_GyLFhtf5u9n-v9Ckd7  all  --  anywhere             anywhere             /* cali:ENqQnSFYJ7T9OfhK */
RETURN     all  --  anywhere             anywhere             /* cali:okQARuISPEiQKYiE */ /* Return if profile accepted */ mark match 0x10000/0x10000
DROP       all  --  anywhere             anywhere             /* cali:EaDFLNswT4hB4n6R */ /* Drop if no profiles matched */

Chain cali-fw-calif2b2a29df88 (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             /* cali:mTvyK4Axq6CgnYsu */ ctstate RELATED,ESTABLISHED
DROP       all  --  anywhere             anywhere             /* cali:Jvh_nvnuDS1vFhkd */ ctstate INVALID
MARK       all  --  anywhere             anywhere             /* cali:tp2Lee-EUX0twS-d */ MARK and 0xfffeffff
DROP       udp  --  anywhere             anywhere             /* cali:uTg96XYIVSxi-5ON */ /* Drop VXLAN encapped packets originating in workloads */ multiport dports vxlan
DROP       ipv4 --  anywhere             anywhere             /* cali:Ju7a1lK-yUxIgozW */ /* Drop IPinIP encapped packets originating in workloads */
cali-pro-kns.kube-system  all  --  anywhere             anywhere             /* cali:udcy2Hv2u9Q7uKuZ */
RETURN     all  --  anywhere             anywhere             /* cali:4RNMZQKFZHNhkZhx */ /* Return if profile accepted */ mark match 0x10000/0x10000
cali-pro-_kvQu8xaXYEM2wqqPSH  all  --  anywhere             anywhere             /* cali:FdaDoEloWbHR1gng */
RETURN     all  --  anywhere             anywhere             /* cali:C0gglK56wUI1n--l */ /* Return if profile accepted */ mark match 0x10000/0x10000
DROP       all  --  anywhere             anywhere             /* cali:1Yv29udQ9BQGhGJi */ /* Drop if no profiles matched */

Chain cali-pri-_7bnNHSm00P51QAo5Qe (1 references)
target     prot opt source               destination         
           all  --  anywhere             anywhere             /* cali:ksjFpC8Po46siiFK */ /* Profile ksa.kube-system.rke2-coredns-rke2-coredns-autoscaler ingress */

Chain cali-pri-_8SDYViIwwzQDgRml2t (1 references)
target     prot opt source               destination         
           all  --  anywhere             anywhere             /* cali:QsucD7WP9lso1vFm */ /* Profile ksa.kube-system.rke2-snapshot-validation-webhook ingress */

Chain cali-pri-_GyLFhtf5u9n-v9Ckd7 (1 references)
target     prot opt source               destination         
           all  --  anywhere             anywhere             /* cali:Z9reHIOsxthYGfQR */ /* Profile ksa.kube-system.rke2-ingress-nginx ingress */

Chain cali-pri-_XnQ5h_hZf854SLqzqE (1 references)
target     prot opt source               destination         
           all  --  anywhere             anywhere             /* cali:givjwnuhCqJ4e5_N */ /* Profile ksa.cattle-system.cattle ingress */

Chain cali-pri-_kvQu8xaXYEM2wqqPSH (1 references)
target     prot opt source               destination         
           all  --  anywhere             anywhere             /* cali:9nsQKKnHwK9qmSqF */ /* Profile ksa.kube-system.rke2-metrics-server ingress */

Chain cali-pri-_u2Tn2rSoAPffvE7JO6 (1 references)
target     prot opt source               destination         
           all  --  anywhere             anywhere             /* cali:WqgznqAQ-uYV0oBx */ /* Profile ksa.kube-system.coredns ingress */

Chain cali-pri-_vOEu3o_UBpjhIsR6zZ (1 references)
target     prot opt source               destination         
           all  --  anywhere             anywhere             /* cali:y1mg23JI0vvaftK3 */ /* Profile ksa.kube-system.rke2-snapshot-controller ingress */

Chain cali-pri-kns.cattle-system (1 references)
target     prot opt source               destination         
MARK       all  --  anywhere             anywhere             /* cali:4cxptRUh_i7dOJYS */ /* Profile kns.cattle-system ingress */ MARK or 0x10000
RETURN     all  --  anywhere             anywhere             /* cali:sQTmaAADD2pZPOEN */ mark match 0x10000/0x10000

Chain cali-pri-kns.kube-system (6 references)
target     prot opt source               destination         
MARK       all  --  anywhere             anywhere             /* cali:J1TyxtHWd0qaBGK- */ /* Profile kns.kube-system ingress */ MARK or 0x10000
RETURN     all  --  anywhere             anywhere             /* cali:QIB6k7eEKdIg73Jp */ mark match 0x10000/0x10000

Chain cali-pro-_7bnNHSm00P51QAo5Qe (1 references)
target     prot opt source               destination         
           all  --  anywhere             anywhere             /* cali:Cn_XvK0BiITKuv_k */ /* Profile ksa.kube-system.rke2-coredns-rke2-coredns-autoscaler egress */

Chain cali-pro-_8SDYViIwwzQDgRml2t (1 references)
target     prot opt source               destination         
           all  --  anywhere             anywhere             /* cali:tDXYmnqgw4o5XtKy */ /* Profile ksa.kube-system.rke2-snapshot-validation-webhook egress */

Chain cali-pro-_GyLFhtf5u9n-v9Ckd7 (1 references)
target     prot opt source               destination         
           all  --  anywhere             anywhere             /* cali:H-PEsWIdyoAji9Ks */ /* Profile ksa.kube-system.rke2-ingress-nginx egress */

Chain cali-pro-_XnQ5h_hZf854SLqzqE (1 references)
target     prot opt source               destination         
           all  --  anywhere             anywhere             /* cali:SdFoKv0bBkcv1V09 */ /* Profile ksa.cattle-system.cattle egress */

Chain cali-pro-_kvQu8xaXYEM2wqqPSH (1 references)
target     prot opt source               destination         
           all  --  anywhere             anywhere             /* cali:LP1wkR0Ravtrgqj6 */ /* Profile ksa.kube-system.rke2-metrics-server egress */

Chain cali-pro-_u2Tn2rSoAPffvE7JO6 (1 references)
target     prot opt source               destination         
           all  --  anywhere             anywhere             /* cali:0-_UPh39dt5XfhmJ */ /* Profile ksa.kube-system.coredns egress */

Chain cali-pro-_vOEu3o_UBpjhIsR6zZ (1 references)
target     prot opt source               destination         
           all  --  anywhere             anywhere             /* cali:NHAM-yUwuplXkSFw */ /* Profile ksa.kube-system.rke2-snapshot-controller egress */

Chain cali-pro-kns.cattle-system (1 references)
target     prot opt source               destination         
MARK       all  --  anywhere             anywhere             /* cali:KTqm7vO7t18yVt6i */ /* Profile kns.cattle-system egress */ MARK or 0x10000
RETURN     all  --  anywhere             anywhere             /* cali:oL9hjGWtTiKRO9xm */ mark match 0x10000/0x10000

Chain cali-pro-kns.kube-system (6 references)
target     prot opt source               destination         
MARK       all  --  anywhere             anywhere             /* cali:tgOR2S8DVHZW3F1M */ /* Profile kns.kube-system egress */ MARK or 0x10000
RETURN     all  --  anywhere             anywhere             /* cali:HVEEtYPJsiGRXCIt */ mark match 0x10000/0x10000

Chain cali-to-hep-forward (1 references)
target     prot opt source               destination         

Chain cali-to-host-endpoint (1 references)
target     prot opt source               destination         

Chain cali-to-wl-dispatch (1 references)
target     prot opt source               destination         
cali-tw-cali0de3ef31751  all  --  anywhere             anywhere            [goto]  /* cali:DpJVA2C8kt09sLaR */
cali-to-wl-dispatch-2  all  --  anywhere             anywhere            [goto]  /* cali:xCGtX498cOrxSNbb */
cali-to-wl-dispatch-8  all  --  anywhere             anywhere            [goto]  /* cali:1xJ4yYwpbsXfWTe0 */
cali-tw-calidc4e0f2df34  all  --  anywhere             anywhere            [goto]  /* cali:BoxUpEXZ9AGj8aax */
cali-tw-calif2b2a29df88  all  --  anywhere             anywhere            [goto]  /* cali:LNvuJ5Z250OYC8gF */
DROP       all  --  anywhere             anywhere             /* cali:3pxqKw4hbDnaijPF */ /* Unknown interface */

Chain cali-to-wl-dispatch-2 (1 references)
target     prot opt source               destination         
cali-tw-cali262069506ee  all  --  anywhere             anywhere            [goto]  /* cali:fUZvsJ0EgyG0C-zZ */
cali-tw-cali2bcd1788e3c  all  --  anywhere             anywhere            [goto]  /* cali:VsV_J6x_fo4I7-J3 */
DROP       all  --  anywhere             anywhere             /* cali:gNebMh2J6t2TfJ6j */ /* Unknown interface */

Chain cali-to-wl-dispatch-8 (1 references)
target     prot opt source               destination         
cali-tw-cali83858136281  all  --  anywhere             anywhere            [goto]  /* cali:-ajsGBV2dVzbebEc */
cali-tw-cali84bef65ee54  all  --  anywhere             anywhere            [goto]  /* cali:k_45M2frMZ1HfnCK */
DROP       all  --  anywhere             anywhere             /* cali:8d3BCBB4AVejc2cI */ /* Unknown interface */

Chain cali-tw-cali0de3ef31751 (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             /* cali:TMioD1zpq6Qp9hnE */ ctstate RELATED,ESTABLISHED
DROP       all  --  anywhere             anywhere             /* cali:tUFzsEia54sUUhdz */ ctstate INVALID
MARK       all  --  anywhere             anywhere             /* cali:5aQai6yPxVIaeKeO */ MARK and 0xfffeffff
cali-pri-kns.cattle-system  all  --  anywhere             anywhere             /* cali:A9S5NlJpUQxIsj60 */
RETURN     all  --  anywhere             anywhere             /* cali:zFHuV-oGTCYCSUKm */ /* Return if profile accepted */ mark match 0x10000/0x10000
cali-pri-_XnQ5h_hZf854SLqzqE  all  --  anywhere             anywhere             /* cali:SwTTAv3VgZGleL_F */
RETURN     all  --  anywhere             anywhere             /* cali:wDneCur31JDrnurP */ /* Return if profile accepted */ mark match 0x10000/0x10000
DROP       all  --  anywhere             anywhere             /* cali:p1oZdBvllx7W-TcU */ /* Drop if no profiles matched */

Chain cali-tw-cali262069506ee (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             /* cali:X4InXE9yLI0dl09W */ ctstate RELATED,ESTABLISHED
DROP       all  --  anywhere             anywhere             /* cali:6pUCMzsZvhEcGSZn */ ctstate INVALID
MARK       all  --  anywhere             anywhere             /* cali:-IbBlb6MX9inqR2i */ MARK and 0xfffeffff
cali-pri-kns.kube-system  all  --  anywhere             anywhere             /* cali:2II54DBxOjLSDiw0 */
RETURN     all  --  anywhere             anywhere             /* cali:yCwa6No29ZW1razM */ /* Return if profile accepted */ mark match 0x10000/0x10000
cali-pri-_7bnNHSm00P51QAo5Qe  all  --  anywhere             anywhere             /* cali:DS1UJ9S2Gvav8744 */
RETURN     all  --  anywhere             anywhere             /* cali:ac-DYnIYmrDwOZSc */ /* Return if profile accepted */ mark match 0x10000/0x10000
DROP       all  --  anywhere             anywhere             /* cali:8Nr3HOH3DGmTiSeo */ /* Drop if no profiles matched */

Chain cali-tw-cali2bcd1788e3c (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             /* cali:ciwAeJxlgyplb2in */ ctstate RELATED,ESTABLISHED
DROP       all  --  anywhere             anywhere             /* cali:fMCDrctf5mq5CUJG */ ctstate INVALID
MARK       all  --  anywhere             anywhere             /* cali:6wozw7hPwCjXZ8Xt */ MARK and 0xfffeffff
cali-pri-kns.kube-system  all  --  anywhere             anywhere             /* cali:XNaOpVVJUTeVglPt */
RETURN     all  --  anywhere             anywhere             /* cali:slaVsa2-lINZ_gcD */ /* Return if profile accepted */ mark match 0x10000/0x10000
cali-pri-_8SDYViIwwzQDgRml2t  all  --  anywhere             anywhere             /* cali:IfyVzdSIbbq41m_2 */
RETURN     all  --  anywhere             anywhere             /* cali:CAIqLXT8DZ8Cxe0p */ /* Return if profile accepted */ mark match 0x10000/0x10000
DROP       all  --  anywhere             anywhere             /* cali:cDeFdVGAEBdMeu19 */ /* Drop if no profiles matched */

Chain cali-tw-cali83858136281 (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             /* cali:mXFXEVS0kJLwnS6L */ ctstate RELATED,ESTABLISHED
DROP       all  --  anywhere             anywhere             /* cali:s_nz5S8lwq_N70AZ */ ctstate INVALID
MARK       all  --  anywhere             anywhere             /* cali:iUTujzUUTjCYTafr */ MARK and 0xfffeffff
cali-pri-kns.kube-system  all  --  anywhere             anywhere             /* cali:yeBRRopDKyQknysu */
RETURN     all  --  anywhere             anywhere             /* cali:M2gQNe9Ux3Qg-0qj */ /* Return if profile accepted */ mark match 0x10000/0x10000
cali-pri-_u2Tn2rSoAPffvE7JO6  all  --  anywhere             anywhere             /* cali:yLRG_sFjuKVquYUy */
RETURN     all  --  anywhere             anywhere             /* cali:Ev16zzQmslxvYkZ3 */ /* Return if profile accepted */ mark match 0x10000/0x10000
DROP       all  --  anywhere             anywhere             /* cali:L9QLQNvuQSapFSt3 */ /* Drop if no profiles matched */

Chain cali-tw-cali84bef65ee54 (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             /* cali:nR7o2wc91J-uIY7V */ ctstate RELATED,ESTABLISHED
DROP       all  --  anywhere             anywhere             /* cali:ob2Db8T8eLSwtZTa */ ctstate INVALID
MARK       all  --  anywhere             anywhere             /* cali:2lgnOd8U34Pn-JQO */ MARK and 0xfffeffff
cali-pri-kns.kube-system  all  --  anywhere             anywhere             /* cali:pYr4GquZaWRLEV8I */
RETURN     all  --  anywhere             anywhere             /* cali:oGNs8gAI8uC40rTo */ /* Return if profile accepted */ mark match 0x10000/0x10000
cali-pri-_vOEu3o_UBpjhIsR6zZ  all  --  anywhere             anywhere             /* cali:V0Gr_nhP5zrSnS-v */
RETURN     all  --  anywhere             anywhere             /* cali:o7KIyTUCOqtSfB3E */ /* Return if profile accepted */ mark match 0x10000/0x10000
DROP       all  --  anywhere             anywhere             /* cali:3OP-pZn64FCbnI4s */ /* Drop if no profiles matched */

Chain cali-tw-calidc4e0f2df34 (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             /* cali:As5LiND4OJyfO0B1 */ ctstate RELATED,ESTABLISHED
DROP       all  --  anywhere             anywhere             /* cali:Cky-6JUMlGb-taPT */ ctstate INVALID
MARK       all  --  anywhere             anywhere             /* cali:oYwPolm8x1jecqxI */ MARK and 0xfffeffff
cali-pri-kns.kube-system  all  --  anywhere             anywhere             /* cali:4usV4ABsaApiM2ja */
RETURN     all  --  anywhere             anywhere             /* cali:xU3qIL45LvQJuzv7 */ /* Return if profile accepted */ mark match 0x10000/0x10000
cali-pri-_GyLFhtf5u9n-v9Ckd7  all  --  anywhere             anywhere             /* cali:dbTtClu4XjfKOPt4 */
RETURN     all  --  anywhere             anywhere             /* cali:4u6FVJT1dI_3Bpig */ /* Return if profile accepted */ mark match 0x10000/0x10000
DROP       all  --  anywhere             anywhere             /* cali:RuU9Xor0VeJwLKVO */ /* Drop if no profiles matched */

Chain cali-tw-calif2b2a29df88 (1 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             /* cali:2POj8G8QI1N4H5se */ ctstate RELATED,ESTABLISHED
DROP       all  --  anywhere             anywhere             /* cali:8KFhLo4wEBEpyHq7 */ ctstate INVALID
MARK       all  --  anywhere             anywhere             /* cali:aa3XVd4V2_bokSlG */ MARK and 0xfffeffff
cali-pri-kns.kube-system  all  --  anywhere             anywhere             /* cali:CQV7h7DK0HPRFRAA */
RETURN     all  --  anywhere             anywhere             /* cali:XNJo8aQLeCeIGsE2 */ /* Return if profile accepted */ mark match 0x10000/0x10000
cali-pri-_kvQu8xaXYEM2wqqPSH  all  --  anywhere             anywhere             /* cali:cIrB0-b53OUaKZhY */
RETURN     all  --  anywhere             anywhere             /* cali:Tq86GmCgHBRWVDxQ */ /* Return if profile accepted */ mark match 0x10000/0x10000
DROP       all  --  anywhere             anywhere             /* cali:xGxoQ3kq76afWgwL */ /* Drop if no profiles matched */

Chain cali-wl-to-host (1 references)
target     prot opt source               destination         
cali-from-wl-dispatch  all  --  anywhere             anywhere             /* cali:Ee9Sbo10IpVujdIY */
ACCEPT     all  --  anywhere             anywhere             /* cali:nSZbcOoG1xPONxb8 */ /* Configured DefaultEndpointToHostAction */
[root@our-dev-k8s0019 ~]# k get nodes
NAME                                                     STATUS   ROLES                       AGE   VERSION
our-dev-k8s0019.server.fr   Ready    control-plane,etcd,master   61m   v1.26.10+rke2r2
our-dev-k8s0020.server.fr   Ready    control-plane,etcd,master   57m   v1.26.10+rke2r2
our-dev-k8s0021.server.fr    Ready    control-plane,etcd,master   52m   v1.26.10+rke2r2

I reproduced the same issues has mentioned up here (need to switch dnsPolicy from ClusterIp to Default to import the cluster in Rancher). I tried to install longhorn and it not work as expected

[root@our-dev-k8s0019 ~]# k get pods -A
NAMESPACE             NAME                                                                              READY   STATUS      RESTARTS   AGE
cattle-fleet-system   fleet-agent-7677f875bb-wkm7w                                                      1/1     Running     0          44m
cattle-system         cattle-cluster-agent-66cd7bbb56-98zfn                                             1/1     Running     0          46m
cattle-system         cattle-cluster-agent-66cd7bbb56-9fs45                                             1/1     Running     0          46m
cattle-system         helm-operation-7kzhf                                                              1/2     Error       0          46m
cattle-system         helm-operation-7pj7p                                                              0/2     Completed   0          40m
cattle-system         helm-operation-s7fb6                                                              1/2     Error       0          45m
cattle-system         rancher-webhook-74c9bd4d6-kw7bv                                                   1/1     Running     0          40m
kube-system           cloud-controller-manager-our-dev-k8s0019.server.fr   1/1     Running     0          64m
kube-system           cloud-controller-manager-our-dev-k8s0020..server.fr   1/1     Running     0          59m
kube-system           cloud-controller-manager-our-dev-k8s0021..server.fr   1/1     Running     0          55m
kube-system           etcd-our-dev-k8s0019.server.fr                      1/1     Running     0          63m
kube-system           etcd-our-dev-k8s0020.server.fr                   1/1     Running     0          59m
kube-system           etcd-our-dev-k8s0021.server.fr                      1/1     Running     0          55m
kube-system           helm-install-rke2-canal-s4jdl                                                     0/1     Completed   0          64m
kube-system           helm-install-rke2-coredns-dvbj7                                                   0/1     Completed   0          64m
kube-system           helm-install-rke2-ingress-nginx-5vnhb                                             0/1     Completed   0          64m
kube-system           helm-install-rke2-metrics-server-n56v4                                            0/1     Completed   0          64m
kube-system           helm-install-rke2-snapshot-controller-crd-6cxbk                                   0/1     Completed   0          64m
kube-system           helm-install-rke2-snapshot-controller-wclfr                                       0/1     Completed   0          64m
kube-system           helm-install-rke2-snapshot-validation-webhook-6jcgx                               0/1     Completed   0          64m
kube-system           kube-apiserver-our-dev-k8s0019.server.fr            1/1     Running     0          64m
kube-system           kube-apiserver-our-dev-k8s0020.server.fr           1/1     Running     0          59m
kube-system           kube-apiserver-our-dev-k8s0021.server.fr            1/1     Running     0          55m
kube-system           kube-controller-manager-our-dev-k8s0019.server.fr   1/1     Running     0          64m
kube-system           kube-controller-manager-our-dev-k8s0020.server.fr    1/1     Running     0          59m
kube-system           kube-controller-manager-our-dev-k8s0021.server.fr   1/1     Running     0          55m
kube-system           kube-proxy-our-dev-k8s0019.server.fr               1/1     Running     0          64m
kube-system           kube-proxy-our-dev-k8s0020.server.fr                1/1     Running     0          59m
kube-system           kube-proxy-our-dev-k8s0021.server.fr               1/1     Running     0          55m
kube-system           kube-scheduler-our-dev-k8s0019.server.fr             1/1     Running     0          64m
kube-system           kube-scheduler-our-dev-k8s0020.server.fr             1/1     Running     0          59m
kube-system           kube-scheduler-our-dev-k8s0021.server.fr            1/1     Running     0          55m
kube-system           rke2-canal-2944v                                                                  2/2     Running     0          59m
kube-system           rke2-canal-dw2x2                                                                  2/2     Running     0          55m
kube-system           rke2-canal-jbgl4                                                                  2/2     Running     0          64m
kube-system           rke2-coredns-rke2-coredns-565dfc7d75-czxtd                                        1/1     Running     0          64m
kube-system           rke2-coredns-rke2-coredns-565dfc7d75-l4kw4                                        1/1     Running     0          59m
kube-system           rke2-coredns-rke2-coredns-autoscaler-6c48c95bf9-thqxq                             1/1     Running     0          64m
kube-system           rke2-ingress-nginx-controller-szkzc                                               1/1     Running     0          55m
kube-system           rke2-ingress-nginx-controller-vv5p2                                               1/1     Running     0          63m
kube-system           rke2-ingress-nginx-controller-xtq86                                               1/1     Running     0          59m
kube-system           rke2-metrics-server-c9c78bd66-sbkz7                                               1/1     Running     0          63m
kube-system           rke2-snapshot-controller-6f7bbb497d-c5dx2                                         1/1     Running     0          63m
kube-system           rke2-snapshot-validation-webhook-65b5675d5c-m2db5                                 1/1     Running     0          63m
[root@our-dev-k8s0019 ~]# k logs -n cattle-system         helm-operation-7kzhf
Defaulted container "helm" out of: helm, proxy
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
Waiting for Kubernetes API to be available
[root@our-dev-k8s0019 ~]# k logs -n cattle-system         cattle-cluster-agent-66cd7bbb56-98zfn
...
E1116 08:46:23.544390      57 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
time="2023-11-16T08:46:28Z" level=error msg="Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]"
E1116 08:46:33.555526      57 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
time="2023-11-16T08:46:39Z" level=error msg="Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]"
E1116 08:46:43.572635      57 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
time="2023-11-16T08:46:46Z" level=error msg="Failed to install system chart rancher-webhook: pod cattle-system/helm-operation-7kzhf failed, watch closed"
E1116 08:46:48.593018      57 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1116 08:46:48.649406      57 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1116 08:46:48.676885      57 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1116 08:46:48.708418      57 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
time="2023-11-16T08:46:50Z" level=error msg="Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]"
time="2023-11-16T08:46:51Z" level=error msg="Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]"
...

The netshoot pods is deployed :

[root@our-dev-k8s0019 ~]# kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot
If you don't see a command prompt, try pressing enter.
                    dP            dP                           dP   
                    88            88                           88   
88d888b. .d8888b. d8888P .d8888b. 88d888b. .d8888b. .d8888b. d8888P 
88'  `88 88ooood8   88   Y8ooooo. 88'  `88 88'  `88 88'  `88   88   
88    88 88.  ...   88         88 88    88 88.  .88 88.  .88   88   
dP    dP `88888P'   dP   `88888P' dP    dP `88888P' `88888P'   dP   

Welcome to Netshoot! (github.com/nicolaka/netshoot)
Version: 0.11

 tmp-shell  ~  

What kind of tests do you want me to perform ? From where to where ? Which pods, which ports, which services ?

epelaic commented 10 months ago

I did a basic test : start netshoot from master-1 (our-dev-k8s0019) and it déploy the pod on master-3 (our-dev-k8s0021) and tried to test the udp port 8472 on master-2 (our-dev-k8s0020).

 tmp-shell  ~  nc -vzu our-dev-k8s0020.server.fr 8472
nc: getaddrinfo for host "our-dev-k8s0020.server.fr" port 8472: Try again

From pod to host VM it's KO but it's maybe what it is expected by default (PSA).

From host VM master-1 to master-2 :

[root@our-dev-k8s0019 ~]# nc -vzu our-dev-k8s0020.server.fr 8472
Ncat: Version 7.92 ( https://nmap.org/ncat )
Ncat: Connected to 10.xxx.xxx.169:8472.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.04 seconds.

So for NIC to NIC it's ok (also ok for tcp 443, 9443, 6443).

epelaic commented 10 months ago

second test :

 tmp-shell  ~  nslookup www.google.fr                        
;; communications error to 10.43.0.10#53: timed out
;; communications error to 10.43.0.10#53: timed out
;; communications error to 10.43.0.10#53: timed out
;; no servers could be reached

 tmp-shell  ~  nslookup cattle-cluster-agent-66cd7bbb56-9fs45
;; communications error to 10.43.0.10#53: timed out
;; communications error to 10.43.0.10#53: timed out
;; communications error to 10.43.0.10#53: timed out
;; no servers could be reached
[root@our-dev-k8s0019 ~]# kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}'
10.42.0.0/24 10.42.1.0/24 10.42.2.0/24

[root@our-dev-k8s0019 ~]# kubectl cluster-info dump | grep -m 1 cluster-cidr
                            "--cluster-cidr=10.42.0.0/16",

[root@our-dev-k8s0019 ~]# kubectl cluster-info dump | grep -m 1 service-cluster-ip-range
                            "--service-cluster-ip-range=10.43.0.0/16",

[root@our-dev-k8s0019 ~]# kubectl cluster-info dump | grep -m 1 cidr
                            "--allocate-node-cidrs=true",

[root@our-dev-k8s0019 ~]# kubectl cluster-info dump | grep -m 1 10.43
                "clusterIP": "10.43.0.10",
[root@our-dev-k8s0019 ~]# kubectl cluster-info dump |grep "10.43.0.10"
                "clusterIP": "10.43.0.10",
                    "10.43.0.10"
                            "global.clusterDNS=10.43.0.10",
                            "global.clusterDNS=10.43.0.10",
                            "global.clusterDNS=10.43.0.10",
                            "global.clusterDNS=10.43.0.10",
                            "global.clusterDNS=10.43.0.10",
                            "global.clusterDNS=10.43.0.10",
                            "global.clusterDNS=10.43.0.10",
+ helm_update install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16
+ helm_v3 install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16 rke2-canal /tmp/rke2-canal.tgz
+ helm_update install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16
+ helm_v3 install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16 rke2-coredns /tmp/rke2-coredns.tgz
+ helm_update install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16
+ helm_v3 install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16 rke2-ingress-nginx /tmp/rke2-ingress-nginx.tgz
+ helm_update install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16
+ helm_v3 install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16 rke2-metrics-server /tmp/rke2-metrics-server.tgz
+ helm_update install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16
+ helm_v3 install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16 rke2-snapshot-controller-crd /tmp/rke2-snapshot-controller-crd.tgz
+ helm_update install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16
+ helm_v3 install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16 rke2-snapshot-controller /tmp/rke2-snapshot-controller.tgz
+ helm_update install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16
+ helm_v3 install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16 rke2-snapshot-validation-webhook /tmp/rke2-snapshot-validation-webhook.tgz
I1116 08:27:58.229621       1 alloc.go:327] "allocated clusterIPs" service="kube-system/rke2-coredns-rke2-coredns" clusterIPs=map[IPv4:10.43.0.10]
Trace[658982462]: [2m10.434051039s] [2m10.434051039s] END
2023-11-16 08:32:37.714 [INFO][47] felix/int_dataplane.go 1836: Received *proto.ServiceUpdate update from calculation graph msg=name:"rke2-coredns-rke2-coredns" namespace:"kube-system" type:"ClusterIP" cluster_ip:"10.43.0.10" ports:<Protocol:"UDP" Port:53 > ports:<Protocol:"TCP" Port:53 > 
2023-11-16 08:36:30.444 [INFO][48] felix/int_dataplane.go 1836: Received *proto.ServiceUpdate update from calculation graph msg=name:"rke2-coredns-rke2-coredns" namespace:"kube-system" type:"ClusterIP" cluster_ip:"10.43.0.10" ports:<Protocol:"UDP" Port:53 > ports:<Protocol:"TCP" Port:53 > 
[root@our-dev-k8s0019 ~]# k get pods -A -o wide |grep dns 
kube-system           helm-install-rke2-coredns-dvbj7                                                   0/1     Completed   0          4h34m   10.xxx.xxx.131   our-dev-k8s0019.server.fr   <none>           <none>
kube-system           rke2-coredns-rke2-coredns-565dfc7d75-czxtd                                        1/1     Running     0          4h34m   10.42.0.3        our-dev-k8s0019.server.fr   <none>           <none>
kube-system           rke2-coredns-rke2-coredns-565dfc7d75-l4kw4                                        1/1     Running     0          4h30m   10.42.1.2        our-dev-k8s0020.server.fr   <none>           <none>
kube-system           rke2-coredns-rke2-coredns-autoscaler-6c48c95bf9-thqxq                             1/1     Running     0          4h34m   10.42.0.2        our-dev-k8s0019.server.fr   <none>           <none>
[root@our-dev-k8s0019 ~]# nslookup www.google.fr
Server:     172.xxx.xxx.241
Address:    172.xxx.xxx.241#53

Non-authoritative answer:
Name:   www.google.fr
Address: 142.250.13.94
Name:   www.google.fr
Address: 2a00:1450:400c:c03::5e
epelaic commented 10 months ago

Additionnal test : run netshoot from master-1 (deployed on master-3)

[root@our-dev-k8s0019 ~]# kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot
If you don't see a command prompt, try pressing enter.
                    dP            dP                           dP   
                    88            88                           88   
88d888b. .d8888b. d8888P .d8888b. 88d888b. .d8888b. .d8888b. d8888P 
88'  `88 88ooood8   88   Y8ooooo. 88'  `88 88'  `88 88'  `88   88   
88    88 88.  ...   88         88 88    88 88.  .88 88.  .88   88   
dP    dP `88888P'   dP   `88888P' dP    dP `88888P' `88888P'   dP   

Welcome to Netshoot! (github.com/nicolaka/netshoot)
Version: 0.11

 tmp-shell  ~  nc -zvu 10.43.0.10 53
Connection to 10.43.0.10 53 port [udp/domain] succeeded!

From master-2 deployed on master-2

[root@our-dev-k8s0020 ~]# kubectl run tmp-shell2 --rm -i --tty --image nicolaka/netshoot
E1116 14:59:58.800225  389330 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1116 14:59:58.803044  389330 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
If you don't see a command prompt, try pressing enter.
                    dP            dP                           dP   
                    88            88                           88   
88d888b. .d8888b. d8888P .d8888b. 88d888b. .d8888b. .d8888b. d8888P 
88'  `88 88ooood8   88   Y8ooooo. 88'  `88 88'  `88 88'  `88   88   
88    88 88.  ...   88         88 88    88 88.  .88 88.  .88   88   
dP    dP `88888P'   dP   `88888P' dP    dP `88888P' `88888P'   dP   

Welcome to Netshoot! (github.com/nicolaka/netshoot)
Version: 0.11

 tmp-shell2  ~  nc -zvu 10.43.0.10 53
Connection to 10.43.0.10 53 port [udp/domain] succeeded!

From master-3 deployed on master-3

[root@our-dev-k8s0021 ~]# kubectl run tmp-shell3 --rm -i --tty --image nicolaka/netshoot
E1116 15:01:27.794321  386708 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1116 15:01:27.807681  386708 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
If you don't see a command prompt, try pressing enter.
                    dP            dP                           dP   
                    88            88                           88   
88d888b. .d8888b. d8888P .d8888b. 88d888b. .d8888b. .d8888b. d8888P 
88'  `88 88ooood8   88   Y8ooooo. 88'  `88 88'  `88 88'  `88   88   
88    88 88.  ...   88         88 88    88 88.  .88 88.  .88   88   
dP    dP `88888P'   dP   `88888P' dP    dP `88888P' `88888P'   dP   

Welcome to Netshoot! (github.com/nicolaka/netshoot)
Version: 0.11

 tmp-shell3  ~  nc -zvu 10.43.0.10 53
Connection to 10.43.0.10 53 port [udp/domain] succeeded!
epelaic commented 10 months ago

Here the content of /etc/resolv.conf in the netshoot pod :

tmp-shell  ~  cat /etc/resolv.conf 
search default.svc.cluster.local svc.cluster.local cluster.local server.fr
nameserver 10.43.0.10
options ndots:5
epelaic commented 10 months ago

Here the network config of the netshoot pod :

 tmp-shell  ~  route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         169.254.1.1     0.0.0.0         UG    0      0        0 eth0
169.254.1.1     0.0.0.0         255.255.255.255 UH    0      0        0 eth0

 tmp-shell  ~  ifconfig 
eth0      Link encap:Ethernet  HWaddr 1A:94:98:ED:9F:61  
          inet addr:10.42.2.16  Bcast:0.0.0.0  Mask:255.255.255.255
          inet6 addr: fe80::1894:98ff:feed:9f61/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:7 errors:0 dropped:0 overruns:0 frame:0
          TX packets:20 errors:0 dropped:1 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:530 (530.0 B)  TX bytes:1654 (1.6 KiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
epelaic commented 10 months ago

Maybe a bad network config or a missing route somewhere ?

epelaic commented 10 months ago

I checked for ipatables and found that nftables service is disabled. I enable it and restart rke2-server on all node but it didn't change anything, shoud I re-install the rke2-server ?

epelaic commented 10 months ago

Here is the result of nft list ruleset :

[root@our-dev-k8s0019 ~]# nft list ruleset
table ip mangle {
    chain PREROUTING {
        type filter hook prerouting priority mangle; policy accept;
         counter packets 267450 bytes 619013764 jump cali-PREROUTING
    }

    chain INPUT {
        type filter hook input priority mangle; policy accept;
    }

    chain FORWARD {
        type filter hook forward priority mangle; policy accept;
    }

    chain OUTPUT {
        type route hook output priority mangle; policy accept;
    }

    chain POSTROUTING {
        type filter hook postrouting priority mangle; policy accept;
         counter packets 255553 bytes 129941617 jump cali-POSTROUTING
    }

    chain KUBE-IPTABLES-HINT {
    }

    chain KUBE-KUBELET-CANARY {
    }

    chain KUBE-PROXY-CANARY {
    }

    chain cali-from-host-endpoint {
    }

    chain cali-to-host-endpoint {
    }

    chain cali-PREROUTING {
         ct state related,established counter packets 261014 bytes 618354654 accept
         mark and 0x10000 == 0x10000 counter packets 0 bytes 0 accept
         counter packets 6436 bytes 659110 jump cali-from-host-endpoint
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 accept
    }

    chain cali-POSTROUTING {
         mark and 0x10000 == 0x10000 counter packets 3 bytes 205 return
         counter packets 255550 bytes 129941412 meta mark set mark and 0xfff0ffff 
         ct status dnat counter packets 10114 bytes 20420474 jump cali-to-host-endpoint
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
    }
}
table ip raw {
    chain PREROUTING {
        type filter hook prerouting priority raw; policy accept;
         counter packets 267450 bytes 619013764 jump cali-PREROUTING
    }

    chain OUTPUT {
        type filter hook output priority raw; policy accept;
         counter packets 255247 bytes 129836136 jump cali-OUTPUT
    }

    chain cali-to-host-endpoint {
    }

    chain cali-PREROUTING {
         counter packets 267450 bytes 619013764 meta mark set mark and 0xfff0ffff 
        iifname "cali*"  counter packets 12804 bytes 4325187 meta mark set mark or 0x40000 
         mark and 0x40000 == 0x40000 counter packets 12804 bytes 4325187 jump cali-rpf-skip
         mark and 0x40000 == 0x40000 fib saddr . mark . iif oif 0 counter packets 0 bytes 0 drop
         mark and 0x40000 == 0x0 counter packets 254646 bytes 614688577 jump cali-from-host-endpoint
         mark and 0x10000 == 0x10000 counter packets 0 bytes 0 accept
    }

    chain cali-OUTPUT {
         counter packets 255247 bytes 129836136 meta mark set mark and 0xfff0ffff 
         counter packets 255247 bytes 129836136 jump cali-to-host-endpoint
         mark and 0x10000 == 0x10000 counter packets 0 bytes 0 accept
    }

    chain cali-rpf-skip {
    }

    chain cali-from-host-endpoint {
    }
}
table ip filter {
    chain INPUT {
        type filter hook input priority filter; policy accept;
         counter packets 267143 bytes 618908231 jump cali-INPUT
        ct state new  counter packets 996 bytes 59760 jump KUBE-PROXY-FIREWALL
         counter packets 277141 bytes 872036194 jump KUBE-NODEPORTS
        ct state new  counter packets 996 bytes 59760 jump KUBE-EXTERNAL-SERVICES
        counter packets 291351 bytes 884820348 jump KUBE-FIREWALL
    }

    chain FORWARD {
        type filter hook forward priority filter; policy accept;
         counter packets 306 bytes 105481 jump cali-FORWARD
        ct state new  counter packets 3 bytes 205 jump KUBE-PROXY-FIREWALL
         counter packets 3 bytes 205 jump KUBE-FORWARD
        ct state new  counter packets 3 bytes 205 jump KUBE-SERVICES
        ct state new  counter packets 3 bytes 205 jump KUBE-EXTERNAL-SERVICES
         counter packets 3 bytes 205 jump FLANNEL-FWD
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 accept
         counter packets 0 bytes 0 meta mark set mark or 0x10000 
    }

    chain OUTPUT {
        type filter hook output priority filter; policy accept;
         counter packets 255246 bytes 129836084 jump cali-OUTPUT
        ct state new  counter packets 2407 bytes 152469 jump KUBE-PROXY-FIREWALL
        ct state new  counter packets 2407 bytes 152469 jump KUBE-SERVICES
        counter packets 288603 bytes 161190559 jump KUBE-FIREWALL
    }

    chain FLANNEL-FWD {
        ip saddr 10.42.0.0/16  counter packets 3 bytes 205 accept
        ip daddr 10.42.0.0/16  counter packets 0 bytes 0 accept
    }

    chain KUBE-FIREWALL {
        ip saddr != 127.0.0.0/8 ip daddr 127.0.0.0/8  ct status dnat counter packets 0 bytes 0 drop
         meta mark & 0x00008000 == 0x00008000 counter packets 0 bytes 0 drop
    }

    chain KUBE-KUBELET-CANARY {
    }

    chain KUBE-PROXY-CANARY {
    }

    chain KUBE-EXTERNAL-SERVICES {
    }

    chain KUBE-NODEPORTS {
    }

    chain KUBE-SERVICES {
    }

    chain KUBE-FORWARD {
        ct state invalid counter packets 0 bytes 0 drop
         meta mark & 0x00004000 == 0x00004000 counter packets 0 bytes 0 accept
         ct state related,established counter packets 0 bytes 0 accept
    }

    chain KUBE-PROXY-FIREWALL {
    }

    chain cali-from-hep-forward {
    }

    chain cali-to-wl-dispatch {
        oifname "cali0bf77a928c5"  counter packets 0 bytes 0 goto cali-tw-cali0bf77a928c5
        oifname "cali1*"  counter packets 0 bytes 0 goto cali-to-wl-dispatch-1
        oifname "cali294d4cb839a"  counter packets 0 bytes 0 goto cali-tw-cali294d4cb839a
        oifname "cali924e7edeb5e"  counter packets 0 bytes 0 goto cali-tw-cali924e7edeb5e
        oifname "calic5b69cbae87"  counter packets 156 bytes 92053 goto cali-tw-calic5b69cbae87
          counter packets 0 bytes 0 drop
    }

    chain cali-to-hep-forward {
    }

    chain cali-wl-to-host {
         counter packets 12655 bytes 4311919 jump cali-from-wl-dispatch
          counter packets 27 bytes 1620 accept
    }

    chain cali-from-host-endpoint {
    }

    chain cali-FORWARD {
         counter packets 306 bytes 105481 meta mark set mark and 0xfff1ffff 
         mark and 0x10000 == 0x0 counter packets 306 bytes 105481 jump cali-from-hep-forward
        iifname "cali*"  counter packets 149 bytes 13268 jump cali-from-wl-dispatch
        oifname "cali*"  counter packets 157 bytes 92213 jump cali-to-wl-dispatch
         counter packets 3 bytes 205 jump cali-to-hep-forward
         counter packets 3 bytes 205 jump cali-cidr-block
    }

    chain cali-OUTPUT {
         mark and 0x10000 == 0x10000 counter packets 0 bytes 0 accept
        oifname "cali*"  counter packets 14149 bytes 18591489 return
         counter packets 241097 bytes 111244595 meta mark set mark and 0xfff0ffff 
         ct status dnat counter packets 238246 bytes 108718081 jump cali-to-host-endpoint
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 accept
    }

    chain cali-from-wl-dispatch {
        iifname "cali0bf77a928c5"  counter packets 485 bytes 35098 goto cali-fw-cali0bf77a928c5
        iifname "cali1*"  counter packets 3341 bytes 432948 goto cali-from-wl-dispatch-1
        iifname "cali294d4cb839a"  counter packets 682 bytes 56667 goto cali-fw-cali294d4cb839a
        iifname "cali924e7edeb5e"  counter packets 1035 bytes 291828 goto cali-fw-cali924e7edeb5e
        iifname "calic5b69cbae87"  counter packets 3962 bytes 1033259 goto cali-fw-calic5b69cbae87
          counter packets 0 bytes 0 drop
    }

    chain cali-cidr-block {
    }

    chain cali-to-host-endpoint {
    }

    chain cali-INPUT {
        iifname "cali*"  counter packets 12655 bytes 4311919 goto cali-wl-to-host
         mark and 0x10000 == 0x10000 counter packets 0 bytes 0 accept
         counter packets 254488 bytes 614596312 meta mark set mark and 0xfff0ffff 
         counter packets 254488 bytes 614596312 jump cali-from-host-endpoint
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 accept
    }

    chain cali-pri-kns.kube-system {
          counter packets 0 bytes 0 meta mark set mark or 0x10000 
         mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
    }

    chain cali-pro-kns.kube-system {
          counter packets 28 bytes 1705 meta mark set mark or 0x10000 
         mark and 0x10000 == 0x10000 counter packets 28 bytes 1705 return
    }

    chain cali-fw-cali294d4cb839a {
         ct state related,established counter packets 709 bytes 60270 accept
         ct state invalid counter packets 0 bytes 0 drop
         counter packets 2 bytes 145 meta mark set mark and 0xfffeffff 
        meta l4proto udp   udp dport 4789 counter packets 0 bytes 0 drop
        meta l4proto ipv4   counter packets 0 bytes 0 drop
         counter packets 2 bytes 145 jump cali-pro-kns.kube-system
          mark and 0x10000 == 0x10000 counter packets 2 bytes 145 return
         counter packets 0 bytes 0 jump cali-pro-_u2Tn2rSoAPffvE7JO6
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
          counter packets 0 bytes 0 drop
    }

    chain cali-tw-cali294d4cb839a {
         ct state related,established counter packets 1 bytes 160 accept
         ct state invalid counter packets 0 bytes 0 drop
         counter packets 0 bytes 0 meta mark set mark and 0xfffeffff 
         counter packets 0 bytes 0 jump cali-pri-kns.kube-system
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
         counter packets 0 bytes 0 jump cali-pri-_u2Tn2rSoAPffvE7JO6
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
          counter packets 0 bytes 0 drop
    }

    chain cali-pri-_u2Tn2rSoAPffvE7JO6 {
          counter packets 0 bytes 0
    }

    chain cali-pro-_u2Tn2rSoAPffvE7JO6 {
          counter packets 0 bytes 0
    }

    chain cali-tw-cali0bf77a928c5 {
         ct state related,established counter packets 0 bytes 0 accept
         ct state invalid counter packets 0 bytes 0 drop
         counter packets 0 bytes 0 meta mark set mark and 0xfffeffff 
         counter packets 0 bytes 0 jump cali-pri-kns.kube-system
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
         counter packets 0 bytes 0 jump cali-pri-_7bnNHSm00P51QAo5Qe
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
          counter packets 0 bytes 0 drop
    }

    chain cali-pri-_7bnNHSm00P51QAo5Qe {
          counter packets 0 bytes 0
    }

    chain cali-pro-_7bnNHSm00P51QAo5Qe {
          counter packets 0 bytes 0
    }

    chain cali-fw-cali0bf77a928c5 {
         ct state related,established counter packets 508 bytes 38275 accept
         ct state invalid counter packets 0 bytes 0 drop
         counter packets 1 bytes 60 meta mark set mark and 0xfffeffff 
        meta l4proto udp   udp dport 4789 counter packets 0 bytes 0 drop
        meta l4proto ipv4   counter packets 0 bytes 0 drop
         counter packets 1 bytes 60 jump cali-pro-kns.kube-system
          mark and 0x10000 == 0x10000 counter packets 1 bytes 60 return
         counter packets 0 bytes 0 jump cali-pro-_7bnNHSm00P51QAo5Qe
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
          counter packets 0 bytes 0 drop
    }

    chain cali-pri-_8SDYViIwwzQDgRml2t {
          counter packets 0 bytes 0
    }

    chain cali-pro-_8SDYViIwwzQDgRml2t {
          counter packets 0 bytes 0
    }

    chain cali-fw-cali924e7edeb5e {
         ct state related,established counter packets 1072 bytes 300300 accept
         ct state invalid counter packets 0 bytes 0 drop
         counter packets 1 bytes 60 meta mark set mark and 0xfffeffff 
        meta l4proto udp   udp dport 4789 counter packets 0 bytes 0 drop
        meta l4proto ipv4   counter packets 0 bytes 0 drop
         counter packets 1 bytes 60 jump cali-pro-kns.kube-system
          mark and 0x10000 == 0x10000 counter packets 1 bytes 60 return
         counter packets 0 bytes 0 jump cali-pro-_8SDYViIwwzQDgRml2t
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
          counter packets 0 bytes 0 drop
    }

    chain cali-tw-cali924e7edeb5e {
         ct state related,established counter packets 0 bytes 0 accept
         ct state invalid counter packets 0 bytes 0 drop
         counter packets 0 bytes 0 meta mark set mark and 0xfffeffff 
         counter packets 0 bytes 0 jump cali-pri-kns.kube-system
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
         counter packets 0 bytes 0 jump cali-pri-_8SDYViIwwzQDgRml2t
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
          counter packets 0 bytes 0 drop
    }

    chain cali-pri-_kvQu8xaXYEM2wqqPSH {
          counter packets 0 bytes 0
    }

    chain cali-pro-_kvQu8xaXYEM2wqqPSH {
          counter packets 0 bytes 0
    }

    chain cali-fw-calic5b69cbae87 {
         ct state related,established counter packets 4029 bytes 1042100 accept
         ct state invalid counter packets 0 bytes 0 drop
         counter packets 4 bytes 240 meta mark set mark and 0xfffeffff 
        meta l4proto udp   udp dport 4789 counter packets 0 bytes 0 drop
        meta l4proto ipv4   counter packets 0 bytes 0 drop
         counter packets 4 bytes 240 jump cali-pro-kns.kube-system
          mark and 0x10000 == 0x10000 counter packets 4 bytes 240 return
         counter packets 0 bytes 0 jump cali-pro-_kvQu8xaXYEM2wqqPSH
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
          counter packets 0 bytes 0 drop
    }

    chain cali-tw-calic5b69cbae87 {
         ct state related,established counter packets 156 bytes 92053 accept
         ct state invalid counter packets 0 bytes 0 drop
         counter packets 0 bytes 0 meta mark set mark and 0xfffeffff 
         counter packets 0 bytes 0 jump cali-pri-kns.kube-system
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
         counter packets 0 bytes 0 jump cali-pri-_kvQu8xaXYEM2wqqPSH
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
          counter packets 0 bytes 0 drop
    }

    chain cali-fw-cali1daca5107f5 {
         ct state related,established counter packets 797 bytes 172688 accept
         ct state invalid counter packets 0 bytes 0 drop
         counter packets 1 bytes 60 meta mark set mark and 0xfffeffff 
        meta l4proto udp   udp dport 4789 counter packets 0 bytes 0 drop
        meta l4proto ipv4   counter packets 0 bytes 0 drop
         counter packets 1 bytes 60 jump cali-pro-kns.kube-system
          mark and 0x10000 == 0x10000 counter packets 1 bytes 60 return
         counter packets 0 bytes 0 jump cali-pro-_vOEu3o_UBpjhIsR6zZ
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
          counter packets 0 bytes 0 drop
    }

    chain cali-tw-cali1daca5107f5 {
         ct state related,established counter packets 0 bytes 0 accept
         ct state invalid counter packets 0 bytes 0 drop
         counter packets 0 bytes 0 meta mark set mark and 0xfffeffff 
         counter packets 0 bytes 0 jump cali-pri-kns.kube-system
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
         counter packets 0 bytes 0 jump cali-pri-_vOEu3o_UBpjhIsR6zZ
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
          counter packets 0 bytes 0 drop
    }

    chain cali-pri-_vOEu3o_UBpjhIsR6zZ {
          counter packets 0 bytes 0
    }

    chain cali-pro-_vOEu3o_UBpjhIsR6zZ {
          counter packets 0 bytes 0
    }

    chain cali-from-wl-dispatch-1 {
        iifname "cali11c4a0384f9"  counter packets 2585 bytes 268113 goto cali-fw-cali11c4a0384f9
        iifname "cali1daca5107f5"  counter packets 762 bytes 166003 goto cali-fw-cali1daca5107f5
          counter packets 0 bytes 0 drop
    }

    chain cali-tw-cali11c4a0384f9 {
         ct state related,established counter packets 0 bytes 0 accept
         ct state invalid counter packets 0 bytes 0 drop
         counter packets 0 bytes 0 meta mark set mark and 0xfffeffff 
         counter packets 0 bytes 0 jump cali-pri-kns.kube-system
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
         counter packets 0 bytes 0 jump cali-pri-_GyLFhtf5u9n-v9Ckd7
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
          counter packets 0 bytes 0 drop
    }

    chain cali-to-wl-dispatch-1 {
        oifname "cali11c4a0384f9"  counter packets 0 bytes 0 goto cali-tw-cali11c4a0384f9
        oifname "cali1daca5107f5"  counter packets 0 bytes 0 goto cali-tw-cali1daca5107f5
          counter packets 0 bytes 0 drop
    }

    chain cali-pri-_GyLFhtf5u9n-v9Ckd7 {
          counter packets 0 bytes 0
    }

    chain cali-pro-_GyLFhtf5u9n-v9Ckd7 {
          counter packets 0 bytes 0
    }

    chain cali-fw-cali11c4a0384f9 {
         ct state related,established counter packets 2584 bytes 268053 accept
         ct state invalid counter packets 0 bytes 0 drop
         counter packets 1 bytes 60 meta mark set mark and 0xfffeffff 
        meta l4proto udp   udp dport 4789 counter packets 0 bytes 0 drop
        meta l4proto ipv4   counter packets 0 bytes 0 drop
         counter packets 1 bytes 60 jump cali-pro-kns.kube-system
          mark and 0x10000 == 0x10000 counter packets 1 bytes 60 return
         counter packets 0 bytes 0 jump cali-pro-_GyLFhtf5u9n-v9Ckd7
          mark and 0x10000 == 0x10000 counter packets 0 bytes 0 return
          counter packets 0 bytes 0 drop
    }
}
table ip nat {
    chain PREROUTING {
        type nat hook prerouting priority dstnat; policy accept;
         counter packets 286 bytes 17185 jump cali-PREROUTING
         counter packets 286 bytes 17185 jump KUBE-SERVICES
        fib daddr type local counter packets 256 bytes 15360 jump CNI-HOSTPORT-DNAT
    }

    chain INPUT {
        type nat hook input priority 100; policy accept;
    }

    chain OUTPUT {
        type nat hook output priority -100; policy accept;
         counter packets 2261 bytes 141328 jump cali-OUTPUT
         counter packets 2386 bytes 150664 jump KUBE-SERVICES
        fib daddr type local counter packets 629 bytes 37740 jump CNI-HOSTPORT-DNAT
    }

    chain POSTROUTING {
        type nat hook postrouting priority srcnat; policy accept;
         counter packets 2004 bytes 121228 jump cali-POSTROUTING
         counter packets 2048 bytes 124804 jump CNI-HOSTPORT-MASQ
         counter packets 2659 bytes 171044 jump KUBE-POSTROUTING
         counter packets 2218 bytes 138413 jump FLANNEL-POSTRTG
    }

    chain FLANNEL-POSTRTG {
        meta mark & 0x00004000 == 0x00004000  counter packets 0 bytes 0 return
        ip saddr 10.42.0.0/24 ip daddr 10.42.0.0/16  counter packets 0 bytes 0 return
        ip saddr 10.42.0.0/16 ip daddr 10.42.0.0/24  counter packets 0 bytes 0 return
        ip saddr != 10.42.0.0/16 ip daddr 10.42.0.0/24  counter packets 610 bytes 36600 return
        ip saddr 10.42.0.0/16 ip daddr != 224.0.0.0/4  counter packets 3 bytes 205 masquerade 
        ip saddr != 10.42.0.0/16 ip daddr 10.42.0.0/16  counter packets 0 bytes 0 masquerade 
    }

    chain KUBE-MARK-DROP {
        counter packets 0 bytes 0 meta mark set mark or 0x8000 
    }

    chain KUBE-MARK-MASQ {
        counter packets 0 bytes 0 meta mark set mark or 0x4000 
    }

    chain KUBE-POSTROUTING {
        meta mark & 0x00004000 != 0x00004000 counter packets 604 bytes 36996 return
        counter packets 0 bytes 0 meta mark set mark xor 0x4000 
         counter packets 0 bytes 0 masquerade 
    }

    chain KUBE-KUBELET-CANARY {
    }

    chain KUBE-PROXY-CANARY {
    }

    chain KUBE-SERVICES {
        meta l4proto tcp ip daddr 10.43.0.10  tcp dport 53 counter packets 0 bytes 0 jump KUBE-SVC-PUNXDRXNIM3ELMDM
        meta l4proto udp ip daddr 10.43.0.10  udp dport 53 counter packets 0 bytes 0 jump KUBE-SVC-YFPH5LFNKP7E3G4L
        meta l4proto tcp ip daddr 10.43.26.80  tcp dport 443 counter packets 0 bytes 0 jump KUBE-SVC-J3KDMBYV4FOFPYZL
        meta l4proto tcp ip daddr 10.43.166.19  tcp dport 443 counter packets 0 bytes 0 jump KUBE-SVC-JOQCPDX2ATLAF3TL
        meta l4proto tcp ip daddr 10.43.115.162  tcp dport 443 counter packets 0 bytes 0 jump KUBE-SVC-46CB5Z6CZH2WFWVP
        meta l4proto tcp ip daddr 10.43.0.1  tcp dport 443 counter packets 0 bytes 0 jump KUBE-SVC-NPX46M4PTMTKRN6Y
         fib daddr type local counter packets 352 bytes 21120 jump KUBE-NODEPORTS
    }

    chain KUBE-NODEPORTS {
    }

    chain KUBE-SVC-NPX46M4PTMTKRN6Y {
        meta l4proto tcp ip saddr != 10.42.0.0/16 ip daddr 10.43.0.1  tcp dport 443 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
          counter packets 0 bytes 0 jump KUBE-SEP-J6KX6PLZHXNWEWDW
          counter packets 0 bytes 0 jump KUBE-SEP-QLWODAQP3TM5I5YB
         counter packets 0 bytes 0 jump KUBE-SEP-AHH5GLIAUZ7S423N
    }

    chain KUBE-SEP-AHH5GLIAUZ7S423N {
        ip saddr 10.xxx.xxx.131  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
        meta l4proto tcp   counter packets 0 bytes 0 dnat to 10.xxx.xxx.131:6443
    }

    chain cali-fip-dnat {
    }

    chain cali-fip-snat {
    }

    chain cali-nat-outgoing {
         # match-set cali40masq-ipam-pools src # ! match-set cali40all-ipam-pools dst counter packets 0 bytes 0 masquerade 
    }

    chain cali-PREROUTING {
         counter packets 286 bytes 17185 jump cali-fip-dnat
    }

    chain cali-POSTROUTING {
         counter packets 2264 bytes 141533 jump cali-fip-snat
         counter packets 2264 bytes 141533 jump cali-nat-outgoing
    }

    chain cali-OUTPUT {
         counter packets 2261 bytes 141328 jump cali-fip-dnat
    }

    chain KUBE-SVC-J3KDMBYV4FOFPYZL {
        meta l4proto tcp ip saddr != 10.42.0.0/16 ip daddr 10.43.26.80  tcp dport 443 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
         counter packets 0 bytes 0 jump KUBE-SEP-F7LS52CWPCWIZW7C
    }

    chain KUBE-SEP-F7LS52CWPCWIZW7C {
        ip saddr 10.42.0.10  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
        meta l4proto tcp   counter packets 0 bytes 0 dnat to 10.42.0.10:8443
    }

    chain CNI-HOSTPORT-SETMARK {
         counter packets 0 bytes 0 meta mark set mark or 0x2000 
    }

    chain CNI-HOSTPORT-MASQ {
        meta mark & 0x00002000 == 0x00002000 counter packets 0 bytes 0 masquerade 
    }

    chain CNI-HOSTPORT-DNAT {
        meta l4proto tcp  tcp dport { 80,443} counter packets 0 bytes 0 jump CNI-DN-00ba02ab4ebf15e0c3bc9
    }

    chain CNI-DN-00ba02ab4ebf15e0c3bc9 {
        ip saddr 10.42.0.13 tcp dport 80 counter packets 0 bytes 0 jump CNI-HOSTPORT-SETMARK
        ip saddr 127.0.0.1 tcp dport 80 counter packets 0 bytes 0 jump CNI-HOSTPORT-SETMARK
        tcp dport 80 counter packets 0 bytes 0 dnat to 10.42.0.13:80
        ip saddr 10.42.0.13 tcp dport 443 counter packets 0 bytes 0 jump CNI-HOSTPORT-SETMARK
        ip saddr 127.0.0.1 tcp dport 443 counter packets 0 bytes 0 jump CNI-HOSTPORT-SETMARK
        tcp dport 443 counter packets 0 bytes 0 dnat to 10.42.0.13:443
    }

    chain KUBE-SVC-JOQCPDX2ATLAF3TL {
        meta l4proto tcp ip saddr != 10.42.0.0/16 ip daddr 10.43.166.19  tcp dport 443 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
         counter packets 0 bytes 0 jump KUBE-SEP-Q2XWK2Q2MZUQGQIN
    }

    chain KUBE-SEP-Q2XWK2Q2MZUQGQIN {
        ip saddr 10.42.0.11  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
        meta l4proto tcp   counter packets 0 bytes 0 dnat to 10.42.0.11:10250
    }

    chain KUBE-SVC-46CB5Z6CZH2WFWVP {
        meta l4proto tcp ip saddr != 10.42.0.0/16 ip daddr 10.43.115.162  tcp dport 443 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
          counter packets 0 bytes 0 jump KUBE-SEP-V76QII4E7SINJISZ
          counter packets 0 bytes 0 jump KUBE-SEP-AZB7IJ6T247MIQ2V
         counter packets 0 bytes 0 jump KUBE-SEP-WOHVTEFPQIOIKJUP
    }

    chain KUBE-SEP-V76QII4E7SINJISZ {
        ip saddr 10.42.0.13  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
        meta l4proto tcp   counter packets 0 bytes 0 dnat to 10.42.0.13:8443
    }

    chain KUBE-SVC-PUNXDRXNIM3ELMDM {
        meta l4proto tcp ip saddr != 10.42.0.0/16 ip daddr 10.43.0.10  tcp dport 53 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
          counter packets 0 bytes 0 jump KUBE-SEP-6RB46LILR7QF6VFL
         counter packets 0 bytes 0 jump KUBE-SEP-FA2Y7VD4ZTC3KM6D
    }

    chain KUBE-SEP-6RB46LILR7QF6VFL {
        ip saddr 10.42.0.6  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
        meta l4proto tcp   counter packets 0 bytes 0 dnat to 10.42.0.6:53
    }

    chain KUBE-SVC-YFPH5LFNKP7E3G4L {
        meta l4proto udp ip saddr != 10.42.0.0/16 ip daddr 10.43.0.10  udp dport 53 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
          counter packets 0 bytes 0 jump KUBE-SEP-ILU5TVOOCLKZWLAM
         counter packets 0 bytes 0 jump KUBE-SEP-2BVGIP4CX3U6KN7B
    }

    chain KUBE-SEP-ILU5TVOOCLKZWLAM {
        ip saddr 10.42.0.6  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
        meta l4proto udp   counter packets 0 bytes 0 dnat to 10.42.0.6:53
    }

    chain KUBE-SEP-QLWODAQP3TM5I5YB {
        ip saddr 10.xxx.xxx.169  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
        meta l4proto tcp   counter packets 0 bytes 0 dnat to 10.xxx.xxx.169:6443
    }

    chain KUBE-SEP-AZB7IJ6T247MIQ2V {
        ip saddr 10.42.1.3  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
        meta l4proto tcp   counter packets 0 bytes 0 dnat to 10.42.1.3:8443
    }

    chain KUBE-SEP-2BVGIP4CX3U6KN7B {
        ip saddr 10.42.1.2  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
        meta l4proto udp   counter packets 0 bytes 0 dnat to 10.42.1.2:53
    }

    chain KUBE-SEP-FA2Y7VD4ZTC3KM6D {
        ip saddr 10.42.1.2  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
        meta l4proto tcp   counter packets 0 bytes 0 dnat to 10.42.1.2:53
    }

    chain KUBE-SEP-J6KX6PLZHXNWEWDW {
        ip saddr 10.xxx.xxx.111  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
        meta l4proto tcp   counter packets 0 bytes 0 dnat to 10.xxx.xxx.111:6443
    }

    chain KUBE-SEP-WOHVTEFPQIOIKJUP {
        ip saddr 10.42.2.2  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
        meta l4proto tcp   counter packets 0 bytes 0 dnat to 10.42.2.2:8443
    }
}
table ip6 mangle {
    chain PREROUTING {
        type filter hook prerouting priority mangle; policy accept;
    }

    chain INPUT {
        type filter hook input priority mangle; policy accept;
    }

    chain FORWARD {
        type filter hook forward priority mangle; policy accept;
    }

    chain OUTPUT {
        type route hook output priority mangle; policy accept;
    }

    chain POSTROUTING {
        type filter hook postrouting priority mangle; policy accept;
    }

    chain KUBE-IPTABLES-HINT {
    }

    chain KUBE-KUBELET-CANARY {
    }

    chain KUBE-PROXY-CANARY {
    }
}
table ip6 filter {
    chain INPUT {
        type filter hook input priority filter; policy accept;
        ct state new  counter packets 787 bytes 62960 jump KUBE-PROXY-FIREWALL
         counter packets 14604 bytes 8857905 jump KUBE-NODEPORTS
        ct state new  counter packets 787 bytes 62960 jump KUBE-EXTERNAL-SERVICES
    }

    chain FORWARD {
        type filter hook forward priority filter; policy accept;
        ct state new  counter packets 0 bytes 0 jump KUBE-PROXY-FIREWALL
         counter packets 0 bytes 0 jump KUBE-FORWARD
        ct state new  counter packets 0 bytes 0 jump KUBE-SERVICES
        ct state new  counter packets 0 bytes 0 jump KUBE-EXTERNAL-SERVICES
    }

    chain OUTPUT {
        type filter hook output priority filter; policy accept;
        ct state new  counter packets 787 bytes 62960 jump KUBE-PROXY-FIREWALL
        ct state new  counter packets 787 bytes 62960 jump KUBE-SERVICES
    }

    chain KUBE-FIREWALL {
         meta mark & 0x00008000 == 0x00008000 counter packets 0 bytes 0 drop
    }

    chain KUBE-KUBELET-CANARY {
    }

    chain KUBE-PROXY-CANARY {
    }

    chain KUBE-EXTERNAL-SERVICES {
    }

    chain KUBE-NODEPORTS {
    }

    chain KUBE-SERVICES {
    }

    chain KUBE-FORWARD {
        ct state invalid counter packets 0 bytes 0 drop
         meta mark & 0x00004000 == 0x00004000 counter packets 0 bytes 0 accept
         ct state related,established counter packets 0 bytes 0 accept
    }

    chain KUBE-PROXY-FIREWALL {
    }
}
table ip6 nat {
    chain PREROUTING {
        type nat hook prerouting priority dstnat; policy accept;
         counter packets 0 bytes 0 jump KUBE-SERVICES
    }

    chain INPUT {
        type nat hook input priority 100; policy accept;
    }

    chain OUTPUT {
        type nat hook output priority -100; policy accept;
         counter packets 787 bytes 62960 jump KUBE-SERVICES
    }

    chain POSTROUTING {
        type nat hook postrouting priority srcnat; policy accept;
         counter packets 820 bytes 65600 jump KUBE-POSTROUTING
    }

    chain KUBE-MARK-DROP {
        counter packets 0 bytes 0 meta mark set mark or 0x8000 
    }

    chain KUBE-MARK-MASQ {
        counter packets 0 bytes 0 meta mark set mark or 0x4000 
    }

    chain KUBE-POSTROUTING {
        meta mark & 0x00004000 != 0x00004000 counter packets 787 bytes 62960 return
        counter packets 0 bytes 0 meta mark set mark xor 0x4000 
         counter packets 0 bytes 0 masquerade  fully-random 
    }

    chain KUBE-KUBELET-CANARY {
    }

    chain KUBE-PROXY-CANARY {
    }

    chain KUBE-SERVICES {
        ip6 daddr != ::1  fib daddr type local counter packets 0 bytes 0 jump KUBE-NODEPORTS
    }

    chain KUBE-NODEPORTS {
    }
}
epelaic commented 10 months ago

 tmp-shell  ~  ip a               
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: eth0@if32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
    link/ether c6:d4:5f:88:2c:cc brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.42.2.4/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::c4d4:5fff:fe88:2ccc/64 scope link 
       valid_lft forever preferred_lft forever

 tmp-shell  ~  ip r get 10.43.0.10
10.43.0.10 via 169.254.1.1 dev eth0 src 10.42.2.4 uid 0 
    cache 
voliveira-tmx commented 10 months ago

I'm having the same problem. My rancher version is v2.7.8 and the Kubernetes version is v1.26.8+rke2r1. I'm getting the error upon trying to install the Monitoring cluster tool version 102.0.2+up40.1.2

epelaic commented 10 months ago

I noticed that master-1 (primary server) has no route 10.42.0.0 10.42.0.0 255.255.255.0 UG 0 0 0 flannel.1 but master-2 & master-3 have.

Response to myself : I understand that a node has only routes to others node's cidr routes and don't need a route definition for him self.

epelaic commented 10 months ago

Additional tests : Try to deploy longhorn on 3 masters nodes, it failed, the helm operator installer cannot commuicate (that's what I understand) to others pods/operator on others nodes.

Try to do the same on a single node cluster, it works (same for graphana). A strange thing, I did not have a dns issue (no need to set dnsPolicy to Default) to import the cluster in rancher admin cluster.

So my assumption is : Single node local requests dispatching/routing seems to work (no need to pass by network vxlan ?). Multi-nodes local requests dispatching/routing seems to work for pods scheduled on the same node who handle the deployment (no need to pass by network vxlan ?). Multi-nodes inter nodes requests dispatching/routing seems to not work for nodes who didn't handle the deployment (pass by network vxlan for sure).

With a coworker, we tried to ping a cidr ip from one node to an other node and the ping is responding. So the network encapsulation seems to work from one pod from a node, traversing the real network nic to come up to the targeted node nic and "deencapsulate" to the cidr layer of the targeted ip.

But we failed to ping a service ip like coredns. So the pb is not on the direction service->cidr-layer->nic. but maybe somewhere between cidr-layer->to-service-pods or a kubelet at service layer ?

I can ping the fleet-agent on ip 10.42.0.19 on a single from netshoot. I can't ping rke2-metrics-server pod on ip 10.43.198.4 on a single node cluster from netshoot.

epelaic commented 10 months ago

Hello, I installed on new cluster (RHEL 9.3, rke2 v1.26.10+rke2r2 + rancher UI 2.7.6, 4VCPU, 8Gb RAM) in a different datacenter, different network config (all 3 masters are in the same subnet compared to the others).

NodePasswordValidationFailed for the master 2 and 3 :

Deferred node password secret validation failed: Internal error occurred: failed calling webhook "rancher.cattle.io.secrets": failed to call webhook: Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/mutation/secrets?timeout=15s": context deadline exceeded

Same problems. Longhorn install :

[root@p-ac-devops-k8s-master-1 ~]# k logs -n cattle-system         helm-operation-w2cf2
Defaulted container "helm" out of: helm, proxy
helm upgrade --install=true --namespace=longhorn-system --timeout=10m0s --values=/home/shell/helm/values-longhorn-crd-102.3.0-up1.5.1.yaml --version=102.3.0+up1.5.1 --wait=true longhorn-crd /home/shell/helm/longhorn-crd-102.3.0-up1.5.1.tgz
Release "longhorn-crd" does not exist. Installing it now.
E1130 12:34:58.114549      23 memcache.go:255] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
...
...
E1130 12:35:00.336824      23 memcache.go:255] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1130 12:35:00.348131      23 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Error: create: failed to create: Internal error occurred: failed calling webhook "rancher.cattle.io.secrets": failed to call webhook: Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/mutation/secrets?timeout=15s": context deadline exceeded

Graphana :

Waiting for Kubernetes API to be available
epelaic commented 9 months ago

@diogoasouza I destroyed my third cluster on the primary datacenter (intially a RHEL 9.3) rebuilt in debian 11 with canal. I compared with my working debian 10 with canal, kernel netfilter modules, sysctl and many others stuff, I don't see anything strange.

When I rebuild the cluster in debian 11 with canal, In single node : nslookup google.com OK every times. In 2 or 3 nodes : nslookup google.com KO (timeout)

But I retried the nslookup many times from the same netshoot pod (pod on a node without coredns instance) and it appears to works randomly...

 tmp-shell3  ~  nslookup google.com
Server:     10.43.0.10
Address:    10.43.0.10#53

Non-authoritative answer:
Name:   google.com
Address: 142.250.179.78
Name:   google.com
Address: 2a00:1450:4007:813::200e

 tmp-shell3  ~  nslookup google.com
;; communications error to 10.43.0.10#53: timed out
;; communications error to 10.43.0.10#53: timed out
;; communications error to 10.43.0.10#53: timed out
;; no servers could be reached

 tmp-shell3  ~  nslookup google.com
;; communications error to 10.43.0.10#53: timed out
;; communications error to 10.43.0.10#53: timed out
;; communications error to 10.43.0.10#53: timed out
;; no servers could be reached

 tmp-shell3  ~  nslookup google.com
;; communications error to 10.43.0.10#53: timed out
;; communications error to 10.43.0.10#53: timed out
;; communications error to 10.43.0.10#53: timed out
;; no servers could be reached

 tmp-shell3  ~  nslookup google.com
Server:     10.43.0.10
Address:    10.43.0.10#53

Non-authoritative answer:
Name:   google.com
Address: 142.250.179.78
;; communications error to 10.43.0.10#53: timed out
;; communications error to 10.43.0.10#53: timed out
;; communications error to 10.43.0.10#53: timed out
;; no servers could be reached

this last one is strange, it begins with a dns resolution and finish with errors...

I managed to change in coredns the session affinity from none to ClientIP and it change the random state of nslookup from pods...

I tried to scale up one coredns per node or one for the cluster and it is always random.

So the problem is maybe between the coredns service ClusterIP and the coredns pods ?

epelaic commented 9 months ago

Hello, running another tests : I put manualy in /etc/hosts all the nodes in plus haproxy loadbalancers.

I Install the cluster with one node and adding a second node, I loop nslookup google.com during second node deployment. No problems until the second rke2-coredns-rke2-coredns-xxxxxxxxx pod pass to state running. If I kill the core dns pod on the second node, the nslookup works.

I disable the coredns autoscaler and run 2 server nodes with 1 instance of coredns. The nslookup works only on if the netshoot pod is on the node having coredns deployed (I managed to switch the coredns instance node).

During the deployment of the coredns pod, if there is no available service I have a normal ;; communications error to 10.43.0.10#53: connection refused responding instantly. As soon a core dns pod is running, it switch to "time out" and we can see the time it takes.

epelaic commented 9 months ago

I tried to switch kube-proxy from iptables to ipvs, same problems. config.yaml + modprobe ip_vs***

kube-proxy-arg:
  - proxy-mode=ipvs
  - ipvs-strict-arp=true
kube-proxy-extra-mount:
- "/lib/modules:/lib/modules:ro"

I tried to switch from cni canal to cilium, same problems.

epelaic commented 9 months ago

I will try cni antrea, it's our targeted cni for production.

rohitsakala commented 9 months ago

voliveira-tmx

@voliveira-tmx Can you please provide some more information on how to reproduce this issue? It would be nice if you could provide as many details as possible...such as which cloud, which os, number of nodes ? how did you install rancher ? what is your environment ? etc..

epelaic commented 9 months ago

Hello, Tested on debian 12 bookworm (6.1.0-13-amd64) same problems. Tested the ansible role rke2 https://github.com/lablabs/ansible-role-rke2, same problems. I will check with a co-worker if there is no LAN conflicts somewhere in the enterprise network (10.42.x.x and 10.43.x.x).

epelaic commented 9 months ago

I switched again kube-proxy to ipvs mode :

root@our-dev-k8s0022:~# k get svc -A
NAMESPACE       NAME                                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
cattle-system   cattle-cluster-agent                      ClusterIP   10.43.27.43     <none>        80/TCP,443/TCP   56m
cattle-system   rancher-webhook                           ClusterIP   10.43.183.57    <none>        443/TCP          54m
default         kubernetes                                ClusterIP   10.43.0.1       <none>        443/TCP          58m
kube-system     rke2-coredns-rke2-coredns                 ClusterIP   10.43.0.10      <none>        53/UDP,53/TCP    57m
kube-system     rke2-ingress-nginx-controller-admission   ClusterIP   10.43.132.127   <none>        443/TCP          57m
kube-system     rke2-metrics-server                       ClusterIP   10.43.4.204     <none>        443/TCP          57m
kube-system     rke2-snapshot-validation-webhook          ClusterIP   10.43.90.91     <none>        443/TCP          57m
root@our-dev-k8s0022:~# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.43.0.1:443 rr
  -> 10.xxx.xxx.34:6443           Masq    1      1          0         
  -> 10.xxx.xxx.18:6443           Masq    1      42         0         
TCP  10.43.0.10:53 rr
  -> 10.42.0.11:53                Masq    1      0          0         
  -> 10.42.3.2:53                 Masq    1      0          0         
TCP  10.43.4.204:443 rr
  -> 10.42.0.7:10250              Masq    1      0          0         
TCP  10.43.27.43:80 rr
  -> 10.42.0.21:80                Masq    1      0          0         
  -> 10.42.0.24:80                Masq    1      0          0         
TCP  10.43.27.43:443 rr
  -> 10.42.0.21:444               Masq    1      0          0         
  -> 10.42.0.24:444               Masq    1      0          0         
TCP  10.43.90.91:443 rr
  -> 10.42.0.6:8443               Masq    1      0          0         
TCP  10.43.132.127:443 rr
  -> 10.42.0.13:8443              Masq    1      0          0         
  -> 10.42.3.3:8443               Masq    1      0          0         
TCP  10.43.183.57:443 rr
  -> 10.42.0.20:9443              Masq    1      0          0         
UDP  10.43.0.10:53 rr
  -> 10.42.0.11:53                Masq    1      0          41        
  -> 10.42.3.2:53                 Masq    1      0          40
root@our-dev-k8s0023:~# ipvsadm -lc
IPVS connection entries
pro expire state       source             virtual            destination
UDP 00:58  UDP         10.42.3.5:55305    10.43.0.10:domain  10.42.0.11:domain
UDP 01:03  UDP         10.42.3.5:58023    10.43.0.10:domain  10.42.3.2:domain
UDP 00:53  UDP         10.42.3.5:59055    10.43.0.10:domain  10.42.0.11:domain
TCP 14:45  ESTABLISHED 10.42.3.2:43954    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 14:41  ESTABLISHED 10.43.0.1:54404    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 14:59  ESTABLISHED 10.42.3.3:45210    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
TCP 14:30  ESTABLISHED 10.43.0.1:54416    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
UDP 00:53  UDP         10.42.3.5:33169    10.43.0.10:domain  10.42.3.2:domain
UDP 01:03  UDP         10.42.3.5:55276    10.43.0.10:domain  10.42.0.11:domain
UDP 00:58  UDP         10.42.3.5:36071    10.43.0.10:domain  10.42.3.2:domain
TCP 14:41  ESTABLISHED 10.43.0.1:42286    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
 tmp-shell2  ~  traceroute 10.43.0.10 (from node 2)
traceroute to 10.43.0.10 (10.43.0.10), 30 hops max, 46 byte packets
 1  rke2-coredns-rke2-coredns.kube-system.svc.cluster.local (10.43.0.10)  0.015 ms  0.006 ms  0.005 ms

It looks like the ipvs load balancing (round robin) works for the DNS (tested on 2 nodes), but still having dns connection time out.

Here what I get during longhorn installation :

root@our-dev-k8s0023:~# ipvsadm -lc
IPVS connection entries
pro expire state       source             virtual            destination
UDP 03:49  UDP         10.42.3.9:43212    10.43.0.10:domain  10.42.0.11:domain
TCP 179:57 NONE        10.42.3.9:0        10.43.23.115:9500  10.42.0.25:9500
TCP 179:22 NONE        10.42.3.9:0        10.43.23.115:65535 10.42.0.25:65535
TCP 14:59  ESTABLISHED 10.42.3.7:42152    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 14:59  ESTABLISHED 10.42.3.7:42210    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
UDP 04:01  UDP         10.42.3.9:40868    10.43.0.10:domain  10.42.0.11:domain
UDP 03:52  UDP         10.42.3.9:35525    10.43.0.10:domain  10.42.3.2:domain
TCP 00:22  SYN_RECV    10.42.3.9:53308    10.43.23.115:9500  10.42.0.25:9500
TCP 00:55  TIME_WAIT   10.42.3.7:37048    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 14:59  ESTABLISHED 10.42.3.6:39290    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
UDP 04:48  UDP         10.42.3.9:33994    10.43.0.10:domain  10.42.0.11:domain
UDP 03:31  UDP         10.42.3.6:43921    10.43.0.10:domain  10.42.3.2:domain
UDP 04:04  UDP         10.42.3.9:37151    10.43.0.10:domain  10.42.3.2:domain
UDP 04:15  UDP         10.42.3.7:47247    10.43.0.10:domain  10.42.0.11:domain
UDP 04:11  UDP         10.42.3.8:39862    10.43.0.10:domain  10.42.0.11:domain
TCP 00:09  CLOSE       10.42.3.7:41976    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
UDP 01:34  UDP         10.42.3.5:48923    10.43.0.10:domain  10.42.3.2:domain
TCP 14:59  ESTABLISHED 10.42.3.7:42070    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 14:59  ESTABLISHED 10.42.3.7:42268    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 01:15  TIME_WAIT   10.42.3.7:51642    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
UDP 04:33  UDP         10.42.3.9:41995    10.43.0.10:domain  10.42.0.11:domain
UDP 04:22  UDP         10.42.3.9:56056    10.43.0.10:domain  10.42.3.2:domain
UDP 04:28  UDP         10.42.3.9:33086    10.43.0.10:domain  10.42.3.2:domain
TCP 00:09  CLOSE       10.42.3.7:41958    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
UDP 03:46  UDP         10.42.3.9:40994    10.43.0.10:domain  10.42.3.2:domain
TCP 14:59  ESTABLISHED 10.42.3.7:42156    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 01:12  TIME_WAIT   10.42.3.7:51596    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 15:00  ESTABLISHED 10.42.3.7:42062    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
TCP 00:09  CLOSE       10.42.3.7:41912    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
TCP 14:59  ESTABLISHED 10.42.3.7:42032    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
UDP 03:57  UDP         10.42.3.9:49013    10.43.0.10:domain  10.42.3.2:domain
TCP 14:59  ESTABLISHED 10.42.3.7:42090    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
UDP 04:19  UDP         10.42.3.9:51566    10.43.0.10:domain  10.42.3.2:domain
TCP 00:09  CLOSE       10.42.3.7:41980    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 00:09  CLOSE       10.42.3.7:42258    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
TCP 00:09  CLOSE       10.42.3.7:41934    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
TCP 01:59  TIME_WAIT   10.42.3.7:42242    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 14:59  ESTABLISHED 10.42.3.7:42126    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
UDP 03:20  UDP         10.42.3.6:40327    10.43.0.10:domain  10.42.0.11:domain
TCP 14:59  ESTABLISHED 10.42.3.7:42218    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
UDP 03:55  UDP         10.42.3.8:60908    10.43.0.10:domain  10.42.3.2:domain
TCP 14:59  ESTABLISHED 10.42.3.7:42112    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
UDP 03:25  UDP         10.42.3.6:54522    10.43.0.10:domain  10.42.3.2:domain
TCP 14:59  ESTABLISHED 10.42.3.7:41890    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
UDP 04:10  UDP         10.42.3.9:40568    10.43.0.10:domain  10.42.0.11:domain
TCP 00:19  SYN_RECV    10.42.3.9:54950    10.43.23.115:9500  10.42.0.25:9500
UDP 03:15  UDP         10.42.3.6:50539    10.43.0.10:domain  10.42.3.2:domain
TCP 14:54  ESTABLISHED 10.42.3.2:43954    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
UDP 04:20  UDP         10.42.3.7:50083    10.43.0.10:domain  10.42.0.11:domain
UDP 03:15  UDP         10.42.3.6:35776    10.43.0.10:domain  10.42.0.11:domain
TCP 00:09  CLOSE       10.42.3.7:42000    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 00:09  CLOSE       10.42.3.7:42246    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
TCP 00:09  CLOSE       10.42.3.7:41956    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
TCP 00:09  CLOSE       10.42.3.7:42012    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
UDP 04:07  UDP         10.42.3.9:60808    10.43.0.10:domain  10.42.3.2:domain
UDP 04:30  UDP         10.42.3.9:39984    10.43.0.10:domain  10.42.0.11:domain
TCP 00:13  SYN_RECV    10.42.3.9:54942    10.43.23.115:9500  10.42.0.25:9500
TCP 14:56  ESTABLISHED 10.43.0.1:54404    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
UDP 03:54  UDP         10.42.3.9:56752    10.43.0.10:domain  10.42.0.11:domain
UDP 04:51  UDP         10.42.3.9:50686    10.43.0.10:domain  10.42.3.2:domain
TCP 00:09  CLOSE       10.42.3.7:42160    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
TCP 00:09  CLOSE       10.42.3.7:42106    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
UDP 04:15  UDP         10.42.3.7:47300    10.43.0.10:domain  10.42.3.2:domain
TCP 00:09  CLOSE       10.42.3.7:42182    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
UDP 04:57  UDP         10.42.3.9:60462    10.43.0.10:domain  10.42.3.2:domain
UDP 03:25  UDP         10.42.3.6:34125    10.43.0.10:domain  10.42.0.11:domain
UDP 04:54  UDP         10.42.3.9:39004    10.43.0.10:domain  10.42.0.11:domain
TCP 00:09  CLOSE       10.42.3.7:41974    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 00:09  CLOSE       10.42.3.7:42170    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
UDP 04:32  UDP         10.42.3.8:42497    10.43.0.10:domain  10.42.3.2:domain
TCP 00:09  CLOSE       10.42.3.7:41928    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 00:06  SYN_RECV    10.42.3.9:51394    10.43.23.115:9500  10.42.0.25:9500
TCP 00:09  CLOSE       10.42.3.7:42046    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
TCP 00:09  CLOSE       10.42.3.7:41964    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
TCP 15:00  ESTABLISHED 10.42.3.3:45210    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
TCP 00:09  CLOSE       10.42.3.7:42088    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 01:59  TIME_WAIT   10.42.3.7:42250    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 14:59  ESTABLISHED 10.42.3.7:42024    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 14:55  ESTABLISHED 10.43.0.1:54416    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
TCP 00:09  CLOSE       10.42.3.7:42080    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
UDP 04:41  UDP         10.42.3.9:49324    10.43.0.10:domain  10.42.3.2:domain
TCP 00:09  CLOSE       10.42.3.7:42198    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 00:51  SYN_RECV    10.42.3.9:52326    10.43.23.115:9500  10.42.0.25:9500
TCP 00:09  CLOSE       10.42.3.7:41948    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 00:55  TIME_WAIT   10.42.3.7:37066    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
UDP 04:43  UDP         10.42.3.9:40192    10.43.0.10:domain  10.42.0.11:domain
UDP 03:50  UDP         10.42.3.8:42345    10.43.0.10:domain  10.42.0.11:domain
TCP 00:09  CLOSE       10.42.3.7:42154    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
UDP 04:25  UDP         10.42.3.9:34929    10.43.0.10:domain  10.42.0.11:domain
UDP 04:13  UDP         10.42.3.9:59108    10.43.0.10:domain  10.42.3.2:domain
TCP 00:09  CLOSE       10.42.3.7:42138    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
UDP 04:36  UDP         10.42.3.9:48029    10.43.0.10:domain  10.42.3.2:domain
TCP 00:09  CLOSE       10.42.3.7:41990    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
UDP 04:25  UDP         10.42.3.7:46966    10.43.0.10:domain  10.42.0.11:domain
TCP 00:09  CLOSE       10.42.3.7:42050    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
UDP 04:01  UDP         10.42.3.8:35780    10.43.0.10:domain  10.42.0.11:domain
UDP 04:46  UDP         10.42.3.9:49958    10.43.0.10:domain  10.42.3.2:domain
UDP 04:16  UDP         10.42.3.9:40396    10.43.0.10:domain  10.42.0.11:domain
TCP 00:57  SYN_RECV    10.42.3.9:52336    10.43.23.115:9500  10.42.0.25:9500
TCP 00:09  CLOSE       10.42.3.7:41898    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 00:09  CLOSE       10.42.3.7:42230    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
TCP 00:09  CLOSE       10.42.3.7:42034    10.43.0.1:https    our-dev-k8s0022.server.fr:6443
TCP 00:03  SYN_RECV    10.42.3.9:51378    10.43.23.115:9500  10.42.0.25:9500
UDP 04:38  UDP         10.42.3.9:45882    10.43.0.10:domain  10.42.0.11:domain
UDP 04:25  UDP         10.42.3.7:57408    10.43.0.10:domain  10.42.3.2:domain
TCP 14:56  ESTABLISHED 10.43.0.1:42286    10.43.0.1:https    our-dev-k8s0023.server.fr:6443
root@our-dev-k8s0022:~# k get svc -A
NAMESPACE         NAME                                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
cattle-system     cattle-cluster-agent                      ClusterIP   10.43.27.43     <none>        80/TCP,443/TCP   70m
cattle-system     rancher-webhook                           ClusterIP   10.43.183.57    <none>        443/TCP          69m
default           kubernetes                                ClusterIP   10.43.0.1       <none>        443/TCP          72m
kube-system       rke2-coredns-rke2-coredns                 ClusterIP   10.43.0.10      <none>        53/UDP,53/TCP    72m
kube-system       rke2-ingress-nginx-controller-admission   ClusterIP   10.43.132.127   <none>        443/TCP          71m
kube-system       rke2-metrics-server                       ClusterIP   10.43.4.204     <none>        443/TCP          71m
kube-system       rke2-snapshot-validation-webhook          ClusterIP   10.43.90.91     <none>        443/TCP          72m
longhorn-system   longhorn-admission-webhook                ClusterIP   10.43.94.87     <none>        9502/TCP         5m46s
longhorn-system   longhorn-backend                          ClusterIP   10.43.23.115    <none>        9500/TCP         5m46s
longhorn-system   longhorn-conversion-webhook               ClusterIP   10.43.255.143   <none>        9501/TCP         5m46s
longhorn-system   longhorn-engine-manager                   ClusterIP   None            <none>        <none>           5m46s
longhorn-system   longhorn-frontend                         ClusterIP   10.43.8.117     <none>        80/TCP           5m46s
longhorn-system   longhorn-recovery-backend                 ClusterIP   10.43.103.244   <none>        9503/TCP         5m46s
longhorn-system   longhorn-replica-manager                  ClusterIP   None            <none>        <none>           5m46s

Lots of : Deployment is not ready: longhorn-system/longhorn-driver-deployer. 0 out of 1 expected pods are ready

So, how can I test ipvs can access the targeted server ?

epelaic commented 9 months ago

@diogoasouza Hello, I tried to install older versions ok rke2 (v1.20, v1.23, v1.24) and same problem. Asking with my coworkers, we may have found the problem (not the cause) !

When checking on the cluster v1.20 / debian 12 the network interfaces, on the primary node the flannel.1 have RX et TX to zero On node 2, we have RX sending but RX to zero.

Debian 12, rke2v1.20, Node 1 :

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.42.0.0  netmask 255.255.255.255  broadcast 10.42.0.0
        ether 16:b2:0f:4b:79:d6  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Debian 12, rke2v1.20, Node 2 :

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.42.1.0  netmask 255.255.255.255  broadcast 10.42.1.0
        ether fe:6e:94:e8:38:f8  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 688  bytes 41324 (40.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

It looks like a node can only communicate with him self when not using the cluster load balancer.

On my working rke2 debian 10 : Node 1

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.42.0.0  netmask 255.255.255.255  broadcast 0.0.0.0
        ether 0e:9a:29:3b:4e:b2  txqueuelen 0  (Ethernet)
        RX packets 174857163  bytes 140701078322 (131.0 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 155944853  bytes 57025240870 (53.1 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Node 2

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.42.1.0  netmask 255.255.255.255  broadcast 0.0.0.0
        ether b6:f3:ff:55:8e:29  txqueuelen 0  (Ethernet)
        RX packets 152291986  bytes 83486627197 (77.7 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 140362349  bytes 49987147315 (46.5 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

The strange thing is the broadcast address set to : Node 1 : inet 10.42.0.0 netmask 255.255.255.255 broadcast 10.42.0.0 Node 2 : inet 10.42.1.0 netmask 255.255.255.255 broadcast 10.42.1.0

Compared to my working cluster : Node 1 : inet 10.42.0.0 netmask 255.255.255.255 broadcast 0.0.0.0 Node 2 : inet 10.42.1.0 netmask 255.255.255.255 broadcast 0.0.0.0

Does the netmask correct when set to 255.255.255.255 ?