[BUG] svclb-traefik* won't start after host crash and restart.

What did you do

How was the cluster created?
- only 1 node, with a volume mapping for /var/rancher.../storage.
What did you do afterwards? My host crashed and after restarting it and restarting k3d, I am no longer able to connect to any app service through ingress.

What did you expect to happen

Ingress should work

Screenshots or terminal output

[rockylinux@rockylinux8 infra_k3d]$ kubectl -n kube-system logs svclb-traefik-dkgkq lb-port-80
+ trap exit TERM INT
+ echo 10.43.70.41
+ grep -Eq :
+ cat /proc/sys/net/ipv4/ip_forward
+ '[' 1 '!=' 1 ]
+ iptables -t nat -I PREROUTING '!' -s 10.43.70.41/32 -p TCP --dport 80 -j DNAT --to 10.43.70.41:80
modprobe: can't change directory to '/lib/modules': No such file or directory
modprobe: can't change directory to '/lib/modules': No such file or directory
modprobe: can't change directory to '/lib/modules': No such file or directory
modprobe: can't change directory to '/lib/modules': No such file or directory
modprobe: can't change directory to '/lib/modules': No such file or directory
iptables v1.8.4 (legacy): can't initialize iptables table `nat': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.

Which OS & Architecture

Linux, Windows, MacOS / amd64, x86, ...? Linux rockylinux8.linuxvmimages.local 4.18.0-348.20.1.el8_5.x86_64 #1 SMP Thu Mar 10 20:59:28 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Which version of `k3d`

output of k3d version

k3d version v5.3.0
k3s version v1.22.6-k3s1 (default)

Which version of docker

output of docker version and docker info [rockylinux@rockylinux8 infra_k3d]$ docker info


Client:
Context:    default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.8.0-docker)
scan: Docker Scan (Docker Inc., v0.17.0)

Server: Containers: 3 Running: 2 Paused: 0 Stopped: 1 Images: 5 Server Version: 20.10.13 Storage Driver: overlay2 Backing Filesystem: xfs Supports d_type: true Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc Default Runtime: runc Init Binary: docker-init containerd version: 2a1d4dbdb2a1030dc5b01e96fb110a9d9f150ecc runc version: v1.0.3-0-gf46b6ba init version: de40ad0 Security Options: seccomp Profile: default Kernel Version: 4.18.0-348.20.1.el8_5.x86_64 Operating System: Rocky Linux 8.5 (Green Obsidian) OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 31.19GiB Name: rockylinux8.linuxvmimages.local ID: RI32:V7KA:PDQG:Q2Z2:DNET:CMMP:3MMG:23OF:RMTN:W6J2:WOQO:N4YA Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

Name:           svclb-traefik-wqjjt
Namespace:      kube-system
Priority:       0
Node:           <none>
Labels:         app=svclb-traefik
                controller-revision-hash=f4f897b4f
                pod-template-generation=1
                svccontroller.k3s.cattle.io/svcname=traefik
Annotations:    <none>
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  DaemonSet/svclb-traefik
Containers:
  lb-port-80:
    Image:      rancher/klipper-lb:v0.3.4
    Port:       80/TCP
    Host Port:  80/TCP
    Environment:
      SRC_PORT:    80
      DEST_PROTO:  TCP
      DEST_PORT:   80
      DEST_IPS:    10.43.184.59
    Mounts:        <none>
  lb-port-443:
    Image:      rancher/klipper-lb:v0.3.4
    Port:       443/TCP
    Host Port:  443/TCP
    Environment:
      SRC_PORT:    443
      DEST_PROTO:  TCP
      DEST_PORT:   443
      DEST_IPS:    10.43.184.59
    Mounts:        <none>
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:         <none>
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     CriticalAddonsOnly op=Exists
                 node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                 node-role.kubernetes.io/master:NoSchedule op=Exists
                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists
                 node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                 node.kubernetes.io/unreachable:NoExecute op=Exists
                 node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  5m44s  default-scheduler  0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports.
  Warning  FailedScheduling  4m32s  default-scheduler  0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports.

I get the same error after creating a new cluster with k3d. My host OS is RHEL 8.5. I think it is related to the fact that RHEL 8.5 only supports nftables, but the kilpper-lb docker image has iptables symlinked to the legacy version.

Relevant versions of things:

Red Hat Enterprise Linux release 8.5 (Ootpa)
Docker version 20.10.14, build a224086
k3d version v5.4.1
k3s version v1.22.7-k3s1 (default)

My workaround was to recreate the rancher/klipper-lb:vb0.3.4 image with this Dockerfile:

FROM rancher/klipper-lb:v0.3.4
# Use nftables iptables not legacy
RUN \
  ln -sf /sbin/xtables-nft-multi /sbin/iptables && \
  ln -sf /sbin/xtables-nft-multi /sbin/iptables-save && \
  ln -sf /sbin/xtables-nft-multi /sbin/iptables-restore
CMD ["entry"]

Then I used k3d image import to inject this new image into the cluster. Eventually kubernetes will use the new image to restart the failed svclb-traefik-xxxxx pod.

It's a hack, but it gets ingress working on my system.

Check this out for a quick fix:

https://github.com/k3d-io/k3d/issues/1021#issuecomment-1559194060 #

To solve the problem properly (rather than use this ad-hoc fix), I would suggest rewriting check_iptables_mode() to use grep inside of the /sbin directory, rather than trying to use lsmod / modprobe

It has been now over a year and this issue has still not been fixed? There is more and more nft-based systems and this is really annoying... In particular, with 0.4.3:

+ info 'legacy mode detected'
+ echo '[INFO] ' 'legacy mode detected'
+ set_legacy
+ ln -sf /sbin/xtables-legacy-multi /sbin/iptables
[INFO]  legacy mode detected
+ ln -sf /sbin/xtables-legacy-multi /sbin/iptables-save
+ ln -sf /sbin/xtables-legacy-multi /sbin/iptables-restore
+ ln -sf /sbin/xtables-legacy-multi /sbin/ip6tables
+ start_proxy
+ echo 0.0.0.0/0
+ grep -Eq :
+ iptables -t filter -I FORWARD -s 0.0.0.0/0 -p TCP --dport 80 -j ACCEPT
modprobe: can't change directory to '/lib/modules': No such file or directory
modprobe: can't change directory to '/lib/modules': No such file or directory
iptables v1.8.8 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.

This is current (5.5.1) k3d using klipper-lb:v0.4.3 on Oracle Linux Server 8.7 (RHEL 8.7 binary compatible). Host is running iptables v1.8.4 (nf_tables) with following packages installed: iptables-1.8.4-23.0.1.el8.x86_64 nftables-0.9.3-26.el8.x86_64 iptables-ebtables-1.8.4-23.0.1.el8.x86_64 python3-nftables-0.9.3-26.el8.x86_64 iptables-libs-1.8.4-23.0.1.el8.x86_64

proposed change do the detection would be to replace lsmod | grep "nf_tables" with lsmod | grep "nf_conntrack" as this is how lsmod output looks like on this system after grepping for "nf_":

#5 0.220 nf_conntrack_netlink    45056  0
#5 0.220 nf_reject_ipv4         16384  1 ipt_REJECT
#5 0.220 nf_nat                 45056  3 xt_nat,xt_MASQUERADE,nft_chain_nat
#5 0.220 nf_conntrack          147456  5 nf_conntrack_netlink,xt_nat,xt_conntrack,xt_MASQUERADE,nf_nat
#5 0.220 nf_defrag_ipv6         24576  1 nf_conntrack
#5 0.220 nf_defrag_ipv4         16384  1 nf_conntrack
#5 0.220 libcrc32c              16384  3 nf_nat,nf_conntrack,xfs

k3s-io / klipper-lb