orange-cloudfoundry / k3s-boshrelease

k3s bosh release (deprecated, only used for new packages requests)
Apache License 2.0
5 stars 5 forks source link

single nic k3s-123.10 network failure - flannel version #102

Closed poblin-orange closed 1 year ago

poblin-orange commented 1 year ago

Failure upgrading from k3s version 123.2 (v1.23.9) to 123.4 (v1.23.10+k3s1)

Tested with bionic stemcell bosh-openstack-kvm-ubuntu-bionic-go_agent/1.97

traces (metrics server on single nic master node):

E0915 12:13:01.926527       1 scraper.go:139] "Failed to scrape node" err="Get \"https://192.168.99.182:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="agents-1"                                                                           │
│ I0915 12:13:03.168199       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"                                                                                                                                                          │
│ E0915 12:13:03.471884       1 scraper.go:139] "Failed to scrape node" err="Get \"https://192.168.99.179:10250/stats/summary?only_cpu_and_memory=true\": dial tcp 192.168.99.179:10250: connect: connection refused" node="server-0"                                          │
│ I

check config:

server/5cfbf108-c5bd-4bff-ae81-5cb0fcc2f1e8:/var/vcap/jobs/k3s-server/bin# /var/vcap/packages/k3s/k3s check-config

Verifying binaries in /var/lib/rancher/k3s/data/85856543788efc88d736552b36bb10dfbc79f6f4fda7f09f30ca8e8f06b21baa/bin:
- sha256sum: good
- links: good

System:
- /sbin iptables v1.6.1: older than v1.8
- swap: disabled
- routes: default CIDRs 10.42.0.0/16 or 10.43.0.0/16 already routed

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

modprobe: FATAL: Module configs not found in directory /lib/modules/5.4.0-124-generic
info: reading kernel config from /boot/config-5.4.0-124-generic ...

Generally Necessary:
- cgroup hierarchy: cgroups Hybrid mounted, cpuset|memory controllers status: good
- /sbin/apparmor_parser
apparmor: enabled and tools installed
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_POSIX_MQUEUE: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: missing
- CONFIG_IP_NF_TARGET_REDIRECT: enabled (as module)
- CONFIG_IP_SET: enabled (as module)
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: enabled
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: enabled
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled (as module)
      - CONFIG_INET_ESP: enabled (as module)
      - CONFIG_INET_XFRM_MODE_TRANSPORT: missing
- Storage Drivers:
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)

STATUS: pass

Failing flannel logs:

time="2022-09-23T17:21:22Z" level=info msg="Module overlay was already loaded"
time="2022-09-23T17:21:22Z" level=info msg="Set sysctl 'net/ipv4/conf/default/forwarding' to 1"
time="2022-09-23T17:21:22Z" level=info msg="Set sysctl 'net/netfilter/nf_conntrack_max' to 131072"
time="2022-09-23T17:21:22Z" level=info msg="Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400"
time="2022-09-23T17:21:22Z" level=info msg="Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600"
time="2022-09-23T17:21:22Z" level=info msg="Set sysctl 'net/ipv4/conf/all/forwarding' to 1"
time="2022-09-23T17:21:22Z" level=info msg="Using private registry config file at /var/vcap/jobs/k3s-agent/config/registries.yaml"
time="2022-09-23T17:21:22Z" level=info msg="Logging containerd to /var/vcap/store/k3s-agent/agent/containerd/containerd.log"
time="2022-09-23T17:21:22Z" level=info msg="Running containerd -c /var/vcap/store/k3s-agent/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/vcap/store/k3s-agent/agent/containerd"
time="2022-09-23T17:21:23Z" level=info msg="Containerd is now running"

E0923 17:21:39.046938    7238 nodelease.go:49] "Failed to get node when trying to set owner ref to the node lease" err="nodes \"services-agents-scaling-test-49\" not found" node="services-agents-scaling-test-49"
E0923 17:21:39.082333    7238 kubelet.go:2466] "Error getting node" err="node \"services-agents-scaling-test-49\" not found"
I0923 17:21:39.101408    7238 kubelet_network_linux.go:57] "Initialized protocol iptables rules." protocol=IPv4
E0923 17:21:39.111583    7238 kubelet_network_linux.go:80] "Failed to ensure that nat chain exists KUBE-MARK-DROP chain" err="error creating chain \"KUBE-MARK-DROP\": exit status 3: ip6tables v1.6.1: can't initialize ip6tables table `nat': Address family not supported by protocol\nPerhaps ip6tables or your kernel needs to be upgraded.\n"
I0923 17:21:39.111613    7238 kubelet_network_linux.go:65] "Failed to initialize protocol iptables rules; some functionality may be missing." protocol=IPv6
I0923 17:21:39.111624    7238 status_manager.go:161] "Starting to sync pod status with apiserver"
I0923 17:21:39.111640    7238 kubelet.go:2028] "Starting kubelet main sync loop"
E0923 17:21:39.111680    7238 kubelet.go:2052] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
I0923 17:21:39.144613    7238 kubelet_node_status.go:70] "Attempting to register node" node="services-agents-scaling-test-49"

time="2022-09-23T17:21:40Z" level=info msg="Running flannel backend."
I0923 17:21:40.988759    7238 vxlan_network.go:61] watching for new subnet leases
E0923 17:21:41.008759    7238 iptables.go:192] Failed to setup IPTables. iptables-restore binary was not found: no iptables-restore version found in string:
E0923 17:21:41.011610    7238 iptables.go:192] Failed to setup IPTables. iptables-restore binary was not found: no iptables-restore version found in string:
I0923 17:21:41.224789    7238 topology_manager.go:200] "Topology Admit Handler"
poblin-orange commented 1 year ago

references:

same issue here https://github.com/k3s-io/k3s/issues/6047, fix for k3s 1.23.11 https://github.com/k3s-io/k3s/issues/6089

poblin-orange commented 1 year ago

should be resolved by a bump. scheduling for k3s 1.24 bosh release