k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
27.91k stars 2.33k forks source link

missing 'flanneld masq' rules #6047

Closed pogossian closed 1 year ago

pogossian commented 2 years ago

Environmental Info:

k3s version v1.22.13+k3s1 (3daf4ed4)
go version go1.16.10

Node(s) CPU architecture, OS, and Version:

Linux ip-172-31-15-30 5.4.0-1078-aws #84~18.04.1-Ubuntu SMP Fri Jun 3 12:59:49 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: 1 server

Describe the bug: After upgrading the k3s version from 1.22.12 to 1.22.13 on my ubuntu 18.04 server, I noticed that my pods couldn't reach the internet, after debugging it turned out that there aren't default masquerade rules on 1.22.13. On ubuntu 20.04 and 22.04, rules are there, and everything works as usual.

Steps To Reproduce:

    0     0 MASQUERADE  all  --  *      *       0.0.0.0/0            0.0.0.0/0            mark match 0x2000/0x2000
    5   220 MASQUERADE  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */
    2   141 MASQUERADE  all  --  *      *       10.42.0.0/16        !224.0.0.0/4          /* flanneld masq */
    0     0 MASQUERADE  all  --  *      *      !10.42.0.0/16         10.42.0.0/16         /* flanneld masq */
    0     0 MASQUERADE  all  --  *      *       0.0.0.0/0            0.0.0.0/0            mark match 0x2000/0x2000
    0     0 MASQUERADE  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */
brandond commented 2 years ago

@manuelbuil any ideas? I suspect that perhaps some of the rules have changed between versions?

manuelbuil commented 2 years ago

@manuelbuil any ideas? I suspect that perhaps some of the rules have changed between versions?

It seems so but I don't think we have changed anything in that area. Need to investigate

rbrtbnfgl commented 2 years ago

Are you using a specific config on /etc/rancher/k3s/config.yaml?

pogossian commented 2 years ago

Are you using a specific config on /etc/rancher/k3s/config.yaml?

No, I'm doing exactly what I've written in 'Steps To Reproduce': on freshly installed ubuntu 18.04 executing this command

curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.22 sh -
rbrtbnfgl commented 2 years ago

Can you give me the K3s logs?

pogossian commented 2 years ago

Sure thing!

k3s.log

Something scary here:

Aug 30 13:48:11 ip-172-31-1-153 k3s[2242]: E0830 13:48:11.660256    2242 iptables.go:192] Failed to setup IPTables. iptables-restore binary was not found: no iptables-restore version found in string:
brandond commented 2 years ago

What do you get from running:

pogossian commented 2 years ago

iptables-restore on ubuntu 18.04 doesn't have the --version argument

root@ip-172-31-7-31:~# k3s check-config

Verifying binaries in /var/lib/rancher/k3s/data/8b526847b7c1f339a0878b18029f581027f82097c5128778df15dd816a058bda/bin:
- sha256sum: good
- links: good

System:
- /sbin iptables v1.6.1: older than v1.8
- swap: disabled
- routes: ok

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

modprobe: FATAL: Module configs not found in directory /lib/modules/5.4.0-1078-aws
info: reading kernel config from /boot/config-5.4.0-1078-aws ...

Generally Necessary:
- cgroup hierarchy: cgroups Hybrid mounted, cpuset|memory controllers status: good
- /sbin/apparmor_parser
apparmor: enabled and tools installed
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_POSIX_MQUEUE: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: missing
- CONFIG_IP_NF_TARGET_REDIRECT: enabled (as module)
- CONFIG_IP_SET: enabled (as module)
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: enabled
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: enabled
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled (as module)
      - CONFIG_INET_ESP: enabled (as module)
      - CONFIG_INET_XFRM_MODE_TRANSPORT: missing
- Storage Drivers:
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)

STATUS: pass
root@ip-172-31-7-31:~# iptables-restore --version
iptables-restore: unrecognized option '--version'
^C
root@ip-172-31-7-31:~# iptables-restore -V
iptables-restore: invalid option -- 'V'
^C
root@ip-172-31-7-31:~# man iptables-restore | tail -1
iptables 1.6.1                                                                                                                                        IPTABLES-RESTORE(8)
root@ip-172-31-7-31:~#
brandond commented 2 years ago
root@ip-172-31-7-31:~# iptables-restore --version
iptables-restore: unrecognized option '--version'

That is broken in a way that is incompatible with Kubernetes, since it runs iptables-restore --version to check the installed version. Reference: https://github.com/kubernetes/kubernetes/blob/release-1.22/pkg/util/iptables/iptables.go#L708

Where is this broken version of iptables-restore from?

pogossian commented 2 years ago

The most exciting thing is why it works on 1.22.12 but not working on 1.22.13

root@ip-172-31-1-244:~# apt show iptables
Package: iptables
Version: 1.6.1-2ubuntu2
Priority: standard
Section: net
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Debian Netfilter Packaging Team <pkg-netfilter-team@lists.alioth.debian.org>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 1635 kB
Depends: libip4tc0 (= 1.6.1-2ubuntu2), libip6tc0 (= 1.6.1-2ubuntu2), libiptc0 (= 1.6.1-2ubuntu2), libxtables12 (= 1.6.1-2ubuntu2), libc6 (>= 2.14), libnetfilter-conntrack3, libnfnetlink0
Suggests: kmod
Homepage: http://www.netfilter.org/
Task: standard, ubuntu-core
Supported: 5y
Download-Size: 269 kB
APT-Manual-Installed: no
APT-Sources: http://us-west-1.ec2.archive.ubuntu.com/ubuntu bionic/main amd64 Packages
Description: administration tools for packet filtering and NAT
 iptables is the userspace command line program used to configure
 the Linux packet filtering ruleset. It is targeted towards system
 administrators. Since Network Address Translation is also configured
 from the packet filter ruleset, iptables is used for this, too. The
 iptables package also includes ip6tables. ip6tables is used for
 configuring the IPv6 packet filter
rbrtbnfgl commented 2 years ago

On the latest version on Flannel that is on the latest K3s we check the iptables-restore version. On the previous version the check wasn't done on Flannel that's why it worked for you. I suggest you to check the installed iptables version and probably update it because 1.8 is required as minimum version, I don't know when they introduced the --version flag.

pogossian commented 2 years ago

For me, staying on 1.22.12 on my old ubuntu 18.04 servers is okay. 
I've opened the issue to understand what's happening because I knew that k3s should work on any Linux w/o issue.
 Also, I suppose it may be necessary to add a Preparation step somewhere here for ubuntu 18.04 if you're not going to fix it.

rbrtbnfgl commented 2 years ago

Ok I checked the Flannel code. The issue was introduced on the check of the version of iptables-restore to verify if it supports the --wait flag. The --wait flag was introduced on 1.6.2 version with the --version flag so it's useless the check as it is. I'll fix it on Flannel.

ShylajaDevadiga commented 2 years ago

Created a cluster using k3s v1.23.10+k3s1 on Ubuntu 18.04 and found pods are in CrashLoop state

$ kubectl get nodes
NAME              STATUS   ROLES                  AGE   VERSION
ip-172-31-7-228   Ready    control-plane,master   51m   v1.23.10+k3s1
ip-172-31-0-57    Ready    <none>                 42m   v1.23.10+k3s1

$ kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS             RESTARTS        AGE
kube-system   coredns-d76bd69b-tr9q6                    0/1     Running            0               45m
kube-system   helm-install-traefik-8qbnv                1/1     Running            8 (8m10s ago)   45m
kube-system   local-path-provisioner-6c79684f77-vqnt7   0/1     CrashLoopBackOff   12 (106s ago)   45m
kube-system   metrics-server-7cd5fcb6b7-ksbt4           0/1     CrashLoopBackOff   12 (80s ago)    45m
kube-system   helm-install-traefik-crd-5zb6g            0/1     CrashLoopBackOff   8 (13s ago)     45m

Logs from coredns and traefik

$ kubectl logs -n kube-system coredns-d76bd69b-tr9q6
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://10.43.0.1:443/version": dial tcp 10.43.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
$ kubectl logs -n kube-system helm-install-traefik-crd-5zb6g
if [[ ${KUBERNETES_SERVICE_HOST} =~ .*:.* ]]; then
    echo "KUBERNETES_SERVICE_HOST is using IPv6"
    CHART="${CHART//%\{KUBERNETES_API\}%/[${KUBERNETES_SERVICE_HOST}]:${KUBERNETES_SERVICE_PORT}}"
else
    CHART="${CHART//%\{KUBERNETES_API\}%/${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}}"
fi

set +v -x
+ [[ '' != \t\r\u\e ]]
+ export HELM_HOST=127.0.0.1:44134
+ HELM_HOST=127.0.0.1:44134
+ helm_v2 init --skip-refresh --client-only --stable-repo-url https://charts.helm.sh/stable/
+ tiller --listen=127.0.0.1:44134 --storage=secret
Creating /home/klipper-helm/.helm 
Creating /home/klipper-helm/.helm/repository 
Creating /home/klipper-helm/.helm/repository/cache 
Creating /home/klipper-helm/.helm/repository/local 
Creating /home/klipper-helm/.helm/plugins 
Creating /home/klipper-helm/.helm/starters 
Creating /home/klipper-helm/.helm/cache/archive 
Creating /home/klipper-helm/.helm/repository/repositories.yaml 
Adding stable repo with URL: https://charts.helm.sh/stable/ 
Adding local repo with URL: http://127.0.0.1:8879/charts 
$HELM_HOME has been configured at /home/klipper-helm/.helm.
Not installing Tiller due to 'client-only' flag having been set
++ jq -r '.Releases | length'
++ timeout -s KILL 30 helm_v2 ls --all '^traefik-crd$' --output json
[main] 2022/09/01 22:05:53 Starting Tiller v2.17.0 (tls=false)
[main] 2022/09/01 22:05:53 GRPC listening on 127.0.0.1:44134
[main] 2022/09/01 22:05:53 Probes listening on :44135
[main] 2022/09/01 22:05:53 Storage driver is Secret
[main] 2022/09/01 22:05:53 Max history per release is 0
[storage] 2022/09/01 22:05:54 listing all releases with filter
+ V2_CHART_EXISTS=
+ [[ '' == \1 ]]
+ [[ '' == \v\2 ]]
+ [[ -f /config/ca-file.pem ]]
+ [[ -n '' ]]
+ shopt -s nullglob
+ helm_content_decode
+ set -e
+ ENC_CHART_PATH=/chart/traefik-crd.tgz.base64
+ CHART_PATH=/tmp/traefik-crd.tgz
+ [[ ! -f /chart/traefik-crd.tgz.base64 ]]
+ return
+ [[ install != \d\e\l\e\t\e ]]
+ helm_repo_init
+ grep -q -e 'https\?://'
chart path is a url, skipping repo update
+ echo 'chart path is a url, skipping repo update'
+ helm_v3 repo remove stable
Error: no repositories configured
+ true
+ return
+ helm_update install
+ [[ helm_v3 == \h\e\l\m\_\v\3 ]]
++ helm_v3 ls --all -f '^traefik-crd$' --namespace kube-system --output json
++ jq -r '"\(.[0].app_version),\(.[0].status)"'
++ tr '[:upper:]' '[:lower:]'
[storage/driver] 2022/09/01 22:06:24 list: failed to list: Get "https://10.43.0.1:443/api/v1/namespaces/kube-system/secrets?labelSelector=OWNER%3DTILLER": dial tcp 10.43.0.1:443: i/o timeout
Error: Kubernetes cluster unreachable: Get "https://10.43.0.1:443/version": dial tcp 10.43.0.1:443: i/o timeout
+ LINE=
+ IFS=,
+ read -r INSTALLED_VERSION STATUS _
+ VALUES=
+ [[ install = \d\e\l\e\t\e ]]
+ [[ '' =~ ^(|null)$ ]]
+ [[ '' =~ ^(|null)$ ]]
+ echo 'Installing helm_v3 chart'
+ helm_v3 install traefik-crd https://10.43.0.1:443/static/charts/traefik-crd-10.19.300.tgz
Error: INSTALLATION FAILED: failed to download "https://10.43.0.1:443/static/charts/traefik-crd-10.19.300.tgz"
mdrahman-suse commented 2 years ago

Validated on k3s v1.25.0-rc2+k3s1

Environment Details

Infrastructure

Node(s) CPU architecture, OS, and Version:

Linux ip-172-31-47-225 5.4.0-1078-aws #84~18.04.1-Ubuntu SMP Fri Jun 3 12:59:49 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"

Cluster Configuration:

1 server, 1 agent

Testing Steps

  1. Install k3s: For replication: curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.24 sh -s - --write-kubeconfig-mode 644 --cluster-init --token test For validation: curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.25.0-rc2+k3s1 sh -s - --write-kubeconfig-mode 644 --cluster-init --token test
  2. Ensure cluster is up and running

Replication Results:

Validation Results:

zbup commented 2 years ago

Thought I was going crazy.. First time I've tried K3S for real, same issue for me...

Ubuntu 18.04.6 LTS

$ k3s -v
k3s version v1.24.4+k3s1 (c3f830e9)
go version go1.18.1

$ sudo iptables -vnL -t nat |grep 'flanneld masq'
$
rbrtbnfgl commented 2 years ago

This should be fixed also on 1.24.4 maybe the tagged release is not updated with the newer commit. 1.24.5 should have the right version.

zbup commented 2 years ago

FWIW, https://get.k3s.io was pointing to v1.24.4+k3s1 by default as of yesterday evening for Ubuntu 18 and it definitely has the issue. When I manually set it to v1.25.0+k3s1 it started working fine.

zbup commented 2 years ago

I can confirm the issue is fixed in v1.24.5-rc1+k3s1 also. You might want to promote it to stable soon since the stable release currently points to v1.24.4+k3s1

brandond commented 2 years ago

QA is still validating the release candidate. We will cut a full release and eventually promote it to stable when validation is complete.

cwayne18 commented 1 year ago

@rancher-max should we close this out