Cannot reach any public service on node after joined to cluster

vojbarzz commented 2 weeks ago

Before creating an issue, make sure you've checked the following:

[X] You are running the latest released version of k0s
[X] Make sure you've searched for existing issues, both open and closed
[X] Make sure you've searched for PRs too, a fix might've been merged already
[X] You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 6.8.0-35-generic #35-Ubuntu SMP PREEMPT_DYNAMIC Mon May 20 15:51:52 UTC 2024 x86_64 GNU/Linux

PRETTY_NAME="Ubuntu 24.04 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo

Version

v1.30.1+k0s.0

Sysinfo

`k0s sysinfo`

Total memory: 62.7 GiB (pass)
Disk space available for /var/lib/k0s: 822.8 GiB (pass)
Name resolution: localhost: [127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 6.8.0-35-generic (pass)
  Max. file descriptors per process: current: 1048576 / max: 1048576 (pass)
  AppArmor: active (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: built-in (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

I'm not able to reach any public service (ping, ssh, .....) to host if added as a worker

Steps to reproduce

exec to any pod on existing node and try to ping/ssh to a not yet added host (it works)
add node using k0sctl
try ping or ssh from the same pod to newly added node

k0sctl.yaml

apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: k0s-ovh
spec:
  hosts:
    - ssh:
        address: 1.2.3.4
        user: xxxx
        port: 22
        keyPath: xxxxx
      role: controller
    - ssh:
        address: 1.2.3.5
        user: xxxx
        port: 22
        keyPath: xxxx
      role: worker
    - ssh:
        address: 1.2.3.6
        user: xxxx
        port: 22
        keyPath: xxxx
      role: worker
  k0s:
    version: v1.30.1+k0s.0
    dynamicConfig: false
    config:
      apiVersion: k0s.k0sproject.io/v1beta1
      kind: Cluster
      metadata:
        name: k0s-ovh
      spec:
        api:
          k0sApiPort: 9443
          port: 6443
        installConfig:
          users:
            etcdUser: etcd
            kineUser: kube-apiserver
            konnectivityUser: konnectivity-server
            kubeAPIserverUser: kube-apiserver
            kubeSchedulerUser: kube-scheduler
        konnectivity:
          adminPort: 8133
          agentPort: 8132
        network:
          kubeProxy:
            disabled: false
            mode: iptables
          kuberouter:
            autoMTU: true
            mtu: 0
            peerRouterASNs: ""
            peerRouterIPs: ""
          podCIDR: 10.244.0.0/16
          provider: kuberouter
          serviceCIDR: 10.96.0.0/12
        podSecurityPolicy:
          defaultPolicy: 00-k0s-privileged
        storage:
          type: etcd
        telemetry:
          enabled: true
        extensions:
          storage:
            type: openebs_local_storage

### Expected behavior I can reach from any pod on any node another node public services ### Actual behavior I'm not able to reach public services on server if added as worker to cluster ### Screenshots and logs _No response_ ### Additional context _No response_

twz123 commented 2 weeks ago

Are you connecting via DNS names or IP addresses? Can you maybe post the output of ssh -vvv ... from before and after, to see what might be the problem? I can only speculate that kube-proxy, kube-router & friends on the worker are configuring something™ that disrupts the connectivity. Does pod-to-pod traffic across nodes work? /cc @juanluisvaladas, as he's the networking guru :upside_down_face:

vojbarzz commented 2 weeks ago

DNS resolution works fine. Also all works fine until I will add host to a cluster as a node:

output:

```network-debugger:~# ping gra1.my-devbox.cloud PING gra1.my-devbox.cloud (54.36.127.120) 56(84) bytes of data. ^C --- gra1.my-devbox.cloud ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1056ms network-debugger:~# ssh -vvv gra1.my-devbox.cloud OpenSSH_9.7p1, OpenSSL 3.3.0 9 Apr 2024 debug1: Reading configuration data /etc/ssh/ssh_config debug1: /etc/ssh/ssh_config line 22: include /etc/ssh/ssh_config.d/*.conf matched no files debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/root/.ssh/known_hosts' debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/root/.ssh/known_hosts2' debug2: resolving "gra1.my-devbox.cloud" port 22 debug3: resolve_host: lookup gra1.my-devbox.cloud:22 debug3: channel_clear_timeouts: clearing debug3: ssh_connect_direct: entering debug1: Connecting to gra1.my-devbox.cloud [54.36.127.120] port 22. debug3: set_sock_tos: set socket 3 IP_TOS 0x48 debug1: connect to address 54.36.127.120 port 22: Operation timed out ssh: connect to host gra1.my-devbox.cloud port 22: Operation timed out```

juanluisvaladas commented 1 week ago

Hi @vojbarzz, I'm guessing this is probably something that happens specifically in OVS because I've never seen this before.

If it's possible, I'd like the following information: 1- I understand you can connect to other hosts in the same subnet because DNS is fine. Correct me if this isn't true 2- If you can connect to other hosts in the same subnet, can you please gather the output of a traceroute to the public host? traceroute 54.36.127.120 (If you don't have traceroute it can be an equivalent like tracepath, mtr, etc..
3- I'd like to see the output of an iptables trace, to do so iptables -t raw -A PREROUTING -d 54.36.127.120 -j TRACE, try to ping it. Then acquire the trace output with dmesg(just the iptables traces from a couple packets should be enough, maybe get the last 20 or 30 lines) and the output of iptables-save -c 4- The output of ip route.

vojbarzz commented 1 week ago

1/ Unfortunately I'm having only two hosts for workers in different zones/datacenter 2/

traceroute


root@fra2 ~# traceroute gra1.my-devbox.cloud
traceroute to gra1.my-devbox.cloud (54.36.127.120), 30 hops max, 60 byte packets
 1  135.125.188.252 (135.125.188.252)  0.559 ms  0.555 ms  0.643 ms
 2  10.17.245.82 (10.17.245.82)  0.547 ms  0.521 ms 10.17.245.80 (10.17.245.80)  0.479 ms
 3  10.73.40.110 (10.73.40.110)  0.153 ms  0.209 ms 10.73.40.68 (10.73.40.68)  0.325 ms
 4  10.73.40.195 (10.73.40.195)  0.170 ms 10.73.40.29 (10.73.40.29)  0.181 ms 10.73.40.97 (10.73.40.97)  0.149 ms
 5  * * *
 6  10.73.1.192 (10.73.1.192)  13.337 ms 10.95.34.34 (10.95.34.34)  13.464 ms 10.95.34.16 (10.95.34.16)  13.486 ms
 7  10.73.1.21 (10.73.1.21)  13.579 ms 10.73.2.175 (10.73.2.175)  13.680 ms 10.73.0.41 (10.73.0.41)  11.654 ms
 8  10.17.155.49 (10.17.155.49)  14.409 ms 10.17.145.9 (10.17.145.9)  14.458 ms 10.17.155.53 (10.17.155.53)  14.231 ms
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *

iptables trace is not producting any output into dmesg (trying before adding a host into the cluster)


root@fra2 ~# dmesg | tail
[27547.598600] kube-bridge: port 8(veth9617b46b) entered disabled state
[27547.599123] veth9617b46b (unregistering): left allmulticast mode
[27547.599127] veth9617b46b (unregistering): left promiscuous mode
[27547.599130] kube-bridge: port 8(veth9617b46b) entered disabled state
[28550.244750] kube-bridge: port 8(veth51a7fa2b) entered blocking state
[28550.244757] kube-bridge: port 8(veth51a7fa2b) entered disabled state
[28550.244767] veth51a7fa2b: entered allmulticast mode
[28550.244818] veth51a7fa2b: entered promiscuous mode
[28550.247125] kube-bridge: port 8(veth51a7fa2b) entered blocking state
[28550.247130] kube-bridge: port 8(veth51a7fa2b) entered forwarding state
root@fra2 ~# ping -c 3 54.36.127.120
PING 54.36.127.120 (54.36.127.120) 56(84) bytes of data.
64 bytes from 54.36.127.120: icmp_seq=1 ttl=56 time=13.4 ms
64 bytes from 54.36.127.120: icmp_seq=2 ttl=56 time=13.4 ms
64 bytes from 54.36.127.120: icmp_seq=3 ttl=56 time=13.4 ms

--- 54.36.127.120 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 13.352/13.379/13.403/0.021 ms
root@fra2 ~# dmesg | tail
[27547.598600] kube-bridge: port 8(veth9617b46b) entered disabled state
[27547.599123] veth9617b46b (unregistering): left allmulticast mode
[27547.599127] veth9617b46b (unregistering): left promiscuous mode
[27547.599130] kube-bridge: port 8(veth9617b46b) entered disabled state
[28550.244750] kube-bridge: port 8(veth51a7fa2b) entered blocking state
[28550.244757] kube-bridge: port 8(veth51a7fa2b) entered disabled state
[28550.244767] veth51a7fa2b: entered allmulticast mode
[28550.244818] veth51a7fa2b: entered promiscuous mode
[28550.247125] kube-bridge: port 8(veth51a7fa2b) entered blocking state
[28550.247130] kube-bridge: port 8(veth51a7fa2b) entered forwarding state
root@fra2 ~# iptables -t raw -L -v -n
Chain PREROUTING (policy ACCEPT 647K packets, 190M bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 TRACE      0    --  *      *       0.0.0.0/0            54.36.127.120       

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

3/ here is an output of iptables.txt

4/

ip ro output


default via 135.125.188.254 dev enp1s0f0 proto dhcp src 135.125.188.239 metric 100 
10.244.0.0/24 dev kube-bridge proto kernel scope link src 10.244.0.1 
10.244.2.0/24 dev tun-8a72a84b43c proto 17 src 135.125.188.239 
135.125.188.0/24 dev enp1s0f0 proto kernel scope link src 135.125.188.239 metric 100 
135.125.188.254 dev enp1s0f0 proto dhcp scope link src 135.125.188.239 metric 100 
213.186.33.99 via 135.125.188.254 dev enp1s0f0 proto dhcp src 135.125.188.239 metric 100

Just to be sure that the issue is clear: If I add node 54.36.127.120 I can ping it from another node using ssh or kubectl debug node/fra2 -it --image nicolaka/netshoot -- bash. But not able to ping it from a regular pod directly kubectl exec -it network-debugger -- bash. Before the node 54.36.127.120 was added I was able to ping from network-debugger pod. If I remove the node using kubectl delete node gra1 I'm able to ping it again from pod network-debugger.

juanluisvaladas commented 1 week ago

Hi @vojbarzz we discussed this in today's call and I don't think we understand exactly what's the issue.

Let's say you have two nodes in your network. Can node A reach node B? Can node A reach a pod running in node B? Can a pod running in node A reach a pod in node B? Can a pod running in node A reach a pod node B? Can node A reach an external ip address like github.com? (this was confirmed in the last answer. Can a pod running in node A reach this external IP?

iptables trace is not producting any output into dmesg (trying before adding a host into the cluster)

This happens because you're using nf_tables, if once we understand the problem we determine we need this information it's available running xtables-monitor -t.

k0sproject / k0s