[BUG] Load balancer IP addresses remain on the Harvester nodes even after removing LoadBalancer objects

starbops commented 6 months ago

Describe the bug

The issue was found when testing the common use cases of LoadBalancer creation and removal after we bumped the kube-vip to v0.8.0.

For example, given an appropriate IPPool object configured and a VM created, here we create a LoadBalancer associated with the VM for port 22. The intent is to access the VM via SSH with the LoadBalancer IP address.

With kube-vip v0.6.0, creating the above LoadBalancer object results the following logs in the kube-vip Pod:

time="2024-04-25T07:40:20Z" level=info msg="[service] adding VIP [192.168.48.61] for [default/frisbee-system-pool-5129db1f-dfl99-ssh-lb]"
time="2024-04-25T07:40:20Z" level=info msg="[service] synchronised in 90ms"

The allocated LoadBalancer IP address will be configured on the mgmt-br interface:

$ ip a show mgmt-br
5: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:01:00:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.48.224/24 brd 192.168.48.255 scope global mgmt-br
       valid_lft forever preferred_lft forever
    inet 192.168.48.240/32 scope global mgmt-br
       valid_lft forever preferred_lft forever
    inet 192.168.48.61/32 scope global mgmt-br
       valid_lft forever preferred_lft forever

Removing the LoadBalancer object results the following logs (in the kube-vip Pod):

time="2024-04-25T07:42:36Z" level=info msg="[LOADBALANCER] Stopping load balancers"
time="2024-04-25T07:42:36Z" level=info msg="[VIP] Releasing the Virtual IP [192.168.48.61]"
time="2024-04-25T07:42:36Z" level=info msg="Removed [b2b33d42-9385-40ca-b8ef-9b1db20cbda4] from manager, [1] advertised services remain"
time="2024-04-25T07:42:36Z" level=info msg="service [default/frisbee-system-pool-5129db1f-dfl99-ssh-lb] has been deleted"

The LoadBalancer IP address is now cleared:

$ ip a show mgmt-br
5: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:01:00:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.48.224/24 brd 192.168.48.255 scope global mgmt-br
       valid_lft forever preferred_lft forever
    inet 192.168.48.240/32 scope global mgmt-br
       valid_lft forever preferred_lft forever

However, with kube-vip v0.8.0, creating the same LoadBalancer object results the following logs (in the kube-vip Pod):

time="2024-04-25T07:52:36Z" level=info msg="(svcs) adding VIP [192.168.48.61] via mgmt-br for [default/frisbee-system-pool-5129db1f-dfl99-ssh-lb]"
time="2024-04-25T07:52:36Z" level=info msg="[service] synchronised in 148ms"
time="2024-04-25T07:52:36Z" level=warning msg="(svcs) already found existing address [192.168.48.61] on adapter [mgmt-br]"
time="2024-04-25T07:52:39Z" level=warning msg="Re-applying the VIP configuration [192.168.48.61] to the interface [mgmt-br]"

From the logs above it seems the controller processed the object two times as the warning messages suggests. But still the allocated LoadBalancer IP address is configured on the mgmt-br interface successfully:

$ ip a show mgmt-br
5: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:01:00:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.48.224/24 brd 192.168.48.255 scope global mgmt-br
       valid_lft forever preferred_lft forever
    inet 192.168.48.240/32 scope global mgmt-br
       valid_lft forever preferred_lft forever
    inet 192.168.48.61/32 scope global mgmt-br
       valid_lft forever preferred_lft forever

Removing the LoadBalancer object results the following logs (in the kube-vip Pod):

time="2024-04-25T07:54:33Z" level=info msg="(svcs) [default/frisbee-system-pool-5129db1f-dfl99-ssh-lb] has been deleted"

From the log above, it seems only the Service object was deleted; no other actions were taken.

The previously allocated load balancer IP address is still on the network interface:

$ ip a show mgmt-br
5: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:01:00:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.48.224/24 brd 192.168.48.255 scope global mgmt-br
       valid_lft forever preferred_lft forever
    inet 192.168.48.240/32 scope global mgmt-br
       valid_lft forever preferred_lft forever
    inet 192.168.48.61/32 scope global mgmt-br
       valid_lft forever preferred_lft forever

It's pretty clear that the iptables rules were not removed either:

$ iptables -nvL | grep -i kube-vip
   86 12843 ACCEPT     tcp  --  *      *       0.0.0.0/0            192.168.48.61        tcp dpt:22 /* default/frisbee-system-pool-5129db1f-dfl99-ssh-lb kube-vip load balancer IP */
    0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            192.168.48.61        udp dpt:68 /* default/frisbee-system-pool-5129db1f-dfl99-ssh-lb kube-vip load balancer IP */
   26  1664 DROP       all  --  *      *       0.0.0.0/0            192.168.48.61        /* default/frisbee-system-pool-5129db1f-dfl99-ssh-lb kube-vip load balancer IP */

This results in potential resource exhaustion (IP address is not released), presentation and inner-working inconsistency (the IPPool shows the IP address is available, which is not), and introduced security issues (users will now be able to access the node, say, via SSH, with the unreleased IP address).

To Reproduce

See above.

Expected behavior

The allocated LoadBalancer IP addresses should be correctly released by kube-vip after the LoadBalancer objects were removed.

Support bundle

Support Bundle 2024-04-25.zip

Environment

Harvester ISO version: v1.3.0 (with kube-vip v0.8.0)
Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630): KVM

Additional context

The kube-vip version was bumped in #5635

The kube-vip DaemonSet looks like the following (note: Harvester v1.3.0 comes with kube-vip v0.6.0 and its chart 0.4.2 as a dependency by default. Here I only updated the image tag to v0.8.0 for easy reproduction):

apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "2"
    meta.helm.sh/release-name: harvester
    meta.helm.sh/release-namespace: harvester-system
    objectset.rio.cattle.io/id: default-mcc-harvester-cattle-fleet-local-system
  creationTimestamp: "2024-04-23T06:23:39Z"
  generation: 2
  labels:
    app.kubernetes.io/managed-by: Helm
    objectset.rio.cattle.io/hash: e852fa897f5eae59a44b4bfe186aad80b10b94b3
  name: kube-vip
  namespace: harvester-system
  resourceVersion: "444376"
  uid: f7cda561-1465-4547-a14e-13825cb76082
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: harvester
      app.kubernetes.io/name: kube-vip
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: harvester
        app.kubernetes.io/name: kube-vip
    spec:
      containers:
      - args:
        - manager
        env:
        - name: cp_enable
          value: "false"
        - name: enable_service_security
          value: "true"
        - name: lb_enable
          value: "true"
        - name: lb_port
          value: "6443"
        - name: svc_enable
          value: "true"
        - name: vip_arp
          value: "true"
        - name: vip_cidr
          value: "32"
        - name: vip_interface
        - name: vip_leaderelection
          value: "false"
        image: ghcr.io/kube-vip/kube-vip-iptables:v0.8.0
        imagePullPolicy: IfNotPresent
        name: kube-vip
        resources: {}
        securityContext:
          capabilities:
            add:
            - NET_ADMIN
            - NET_RAW
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      hostNetwork: true
      nodeSelector:
        node-role.kubernetes.io/control-plane: "true"
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: kube-vip
      serviceAccountName: kube-vip
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
        operator: Exists
  updateStrategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate
status:
  currentNumberScheduled: 1
  desiredNumberScheduled: 1
  numberAvailable: 1
  numberMisscheduled: 0
  numberReady: 1
  observedGeneration: 2
  updatedNumberScheduled: 1

harvesterhci-io-github-bot commented 6 months ago

added backport-needed/1.2.2 issue: #5688.

harvesterhci-io-github-bot commented 6 months ago

added backport-needed/1.3.1 issue: #5689.

harvesterhci-io-github-bot commented 6 months ago

Pre Ready-For-Testing Checklist

[ ] If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted? The HEP PR is at:
[ ] Where is the reproduce steps/test steps documented? The reproduce steps/test steps are at:
[ ] Is there a workaround for the issue? If so, where is it documented? The workaround is at:
[ ] Have the backend code been merged (harvester, harvester-installer, etc) (including backport-needed/*)? The PR is at:
- [ ] Does the PR include the explanation for the fix or the feature?
- [ ] Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart? The PR for the YAML change is at: The PR for the chart change is at:
[ ] If labeled: area/ui Has the UI issue filed or ready to be merged? The UI issue/PR is at:
[ ] If labeled: require/doc, require/knowledge-base Has the necessary document PR submitted or merged? The documentation/KB PR is at:

[ ] If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?
- The automation skeleton PR is at:
- The automation test case PR is at:
[ ] If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility? The compatibility issue is filed at:

bk201 commented 4 months ago

To create a new issue to bump kube-vip to v0.8.1

starbops commented 1 month ago

The reported issue (found when bumping kube-vip to v0.8.0 from v0.6.0) no longer exists in Harvester v1.4 with kube-vip v0.8.1.

DHCP

Load balancer creation:

time="2024-09-13T03:53:35Z" level=info msg="Creating new macvlan interface for DHCP [vip-0e80e041]"
time="2024-09-13T03:53:35Z" level=info msg="Generated mac: 00:00:6C:09:4c:d0"
time="2024-09-13T03:53:35Z" level=info msg="New interface [vip-0e80e041] mac is 00:00:6c:09:4c:d0"
time="2024-09-13T03:53:39Z" level=info msg="(svcs) adding VIP [172.19.31.96] via mgmt-br for [default/test-vm-ssh]"
time="2024-09-13T03:53:39Z" level=info msg="[service] synchronised in 3210ms"
time="2024-09-13T03:53:39Z" level=warning msg="(svcs) already found existing address [172.19.31.96] on adapter [mgmt-br]"
time="2024-09-13T03:53:42Z" level=warning msg="Re-applying the VIP configuration [172.19.31.96] to the interface [mgmt-br]"

The load balancer IP address was assigned to the mgmt-br interface:

harvester-vm-0-default:~ # ip addr show mgmt-br
5: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:01:00:01 brd ff:ff:ff:ff:ff:ff
    inet 172.19.31.224/24 brd 172.19.31.255 scope global mgmt-br
       valid_lft forever preferred_lft forever
    inet 172.19.31.240/32 scope global mgmt-br
       valid_lft forever preferred_lft forever
    inet 172.19.31.96/32 scope global mgmt-br
       valid_lft forever preferred_lft forever

Load balancer removal:

time="2024-09-16T02:33:17Z" level=info msg="[LOADBALANCER] Stopping load balancers"
time="2024-09-16T02:33:17Z" level=info msg="[VIP] Releasing the Virtual IP [172.19.31.96]"
time="2024-09-16T02:33:17Z" level=info msg="release, lease: &{Offer:DHCPv4(xid=0x8d3899f6 hwaddr=00:00:6c:09:4c:d0 msg_type=OFFER, your_ip=172.19.31.96, server_ip=172.19.31.1) ACK:DHCPv4(xid=0x8d3899f6 hwaddr=00:00:6c:09:4c:d0 msg_type=ACK, your_ip=172.19.31.96, server_ip=172.19.31.1) CreationTime:2024-09-16 01:53:43.146825581 +0000 UTC m=+257861.920082388}"
time="2024-09-16T02:33:17Z" level=info msg="Removed [0e80e041-e874-4c9f-bc35-608754d63738] from manager, [1] advertised services remain"
time="2024-09-16T02:33:17Z" level=info msg="(svcs) [default/test-vm-ssh] has been deleted"

The load balancer IP address was cleared from the mgmt-br interface:

harvester-vm-0-default:~ # ip a show mgmt-br
5: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:01:00:01 brd ff:ff:ff:ff:ff:ff
    inet 172.19.31.224/24 brd 172.19.31.255 scope global mgmt-br
       valid_lft forever preferred_lft forever
    inet 172.19.31.240/32 scope global mgmt-br
       valid_lft forever preferred_lft forever

Pool

Load balancer creation:

time="2024-09-16T02:55:38Z" level=info msg="(svcs) adding VIP [172.19.31.101] via mgmt-br for [default/test-vm-ssh]"
time="2024-09-16T02:55:38Z" level=info msg="[service] synchronised in 148ms"
time="2024-09-16T02:55:38Z" level=warning msg="(svcs) already found existing address [172.19.31.101] on adapter [mgmt-br]"
time="2024-09-16T02:55:41Z" level=warning msg="Re-applying the VIP configuration [172.19.31.101] to the interface [mgmt-br]"

The load balancer IP address was assigned to the mgmt-br interface:

harvester-vm-0-default:~ # ip a show mgmt-br
5: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:01:00:01 brd ff:ff:ff:ff:ff:ff
    inet 172.19.31.224/24 brd 172.19.31.255 scope global mgmt-br
       valid_lft forever preferred_lft forever
    inet 172.19.31.240/32 scope global mgmt-br
       valid_lft forever preferred_lft forever
    inet 172.19.31.101/32 scope global mgmt-br
       valid_lft forever preferred_lft forever

Load balancer removal:

time="2024-09-16T02:57:35Z" level=info msg="[LOADBALANCER] Stopping load balancers"
time="2024-09-16T02:57:35Z" level=info msg="[VIP] Releasing the Virtual IP [172.19.31.101]"
time="2024-09-16T02:57:35Z" level=warning msg="could not remove iptables rules to limit traffic ports: could not delete common iptables rules: could not delete iptables rule to drop the traffic to VIP 172.19.31.101: running [/sbin/iptables-legacy -t filter -D INPUT -d 172.19.31.101 -m comment --comment default/test-vm-ssh kube-vip load balancer IP -j DROP --wait]: exit status 4: iptables: Resource temporarily unavailable.\n"
time="2024-09-16T02:57:35Z" level=info msg="Removed [3c453049-7b97-4d6f-89d1-182a07f494ec] from manager, [1] advertised services remain"
time="2024-09-16T02:57:35Z" level=info msg="(svcs) [default/test-vm-ssh] has been deleted"

The load balancer IP address was cleared from the mgmt-br interface:

harvester-vm-0-default:~ # ip a show mgmt-br
5: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:01:00:01 brd ff:ff:ff:ff:ff:ff
    inet 172.19.31.224/24 brd 172.19.31.255 scope global mgmt-br
       valid_lft forever preferred_lft forever
    inet 172.19.31.240/32 scope global mgmt-br
       valid_lft forever preferred_lft forever

Close the issue as resolved.

Notice the warning message during the pool-type load balancer removal:

time="2024-09-16T02:57:35Z" level=warning msg="could not remove iptables rules to limit traffic ports: could not delete common iptables rules: could not delete iptables rule to drop the traffic to VIP 172.19.31.101: running [/sbin/iptables-legacy -t filter -D INPUT -d 172.19.31.101 -m comment --comment default/test-vm-ssh kube-vip load balancer IP -j DROP --wait]: exit status 4: iptables: Resource temporarily unavailable.\n"

This only happens when removing a pool-type load balancer; we don't observe the same issue when removing a DHCP-type load balancer.

The iptables rules residue on the system are found below:

harvester-vm-0-default:~ # iptables -nvL | grep test-vm-ssh
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            172.19.31.101        tcp dpt:22 /* default/test-vm-ssh kube-vip load balancer IP */
    0     0 DROP       all  --  *      *       0.0.0.0/0            172.19.31.101        /* default/test-vm-ssh kube-vip load balancer IP */

These two rules are relevant to the service security feature in kube-vip. Though it does not harm the cluster at first glance (since the IP address no longer exists in the system), we should monitor it and resolve this undesired behavior in the future.

harvester / harvester