Closed starbops closed 1 month ago
added backport-needed/1.2.2
issue: #5688.
added backport-needed/1.3.1
issue: #5689.
[ ] If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted? The HEP PR is at:
[ ] Where is the reproduce steps/test steps documented? The reproduce steps/test steps are at:
[ ] Is there a workaround for the issue? If so, where is it documented? The workaround is at:
[ ] Have the backend code been merged (harvester, harvester-installer, etc) (including backport-needed/*
)?
The PR is at:
[ ] Does the PR include the explanation for the fix or the feature?
[ ] Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart? The PR for the YAML change is at: The PR for the chart change is at:
[ ] If labeled: area/ui Has the UI issue filed or ready to be merged? The UI issue/PR is at:
[ ] If labeled: require/doc, require/knowledge-base Has the necessary document PR submitted or merged? The documentation/KB PR is at:
[ ] If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?
[ ] If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility
?
The compatibility issue is filed at:
To create a new issue to bump kube-vip to v0.8.1
The reported issue (found when bumping kube-vip to v0.8.0 from v0.6.0) no longer exists in Harvester v1.4 with kube-vip v0.8.1.
Load balancer creation:
time="2024-09-13T03:53:35Z" level=info msg="Creating new macvlan interface for DHCP [vip-0e80e041]"
time="2024-09-13T03:53:35Z" level=info msg="Generated mac: 00:00:6C:09:4c:d0"
time="2024-09-13T03:53:35Z" level=info msg="New interface [vip-0e80e041] mac is 00:00:6c:09:4c:d0"
time="2024-09-13T03:53:39Z" level=info msg="(svcs) adding VIP [172.19.31.96] via mgmt-br for [default/test-vm-ssh]"
time="2024-09-13T03:53:39Z" level=info msg="[service] synchronised in 3210ms"
time="2024-09-13T03:53:39Z" level=warning msg="(svcs) already found existing address [172.19.31.96] on adapter [mgmt-br]"
time="2024-09-13T03:53:42Z" level=warning msg="Re-applying the VIP configuration [172.19.31.96] to the interface [mgmt-br]"
The load balancer IP address was assigned to the mgmt-br
interface:
harvester-vm-0-default:~ # ip addr show mgmt-br
5: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 52:54:00:01:00:01 brd ff:ff:ff:ff:ff:ff
inet 172.19.31.224/24 brd 172.19.31.255 scope global mgmt-br
valid_lft forever preferred_lft forever
inet 172.19.31.240/32 scope global mgmt-br
valid_lft forever preferred_lft forever
inet 172.19.31.96/32 scope global mgmt-br
valid_lft forever preferred_lft forever
Load balancer removal:
time="2024-09-16T02:33:17Z" level=info msg="[LOADBALANCER] Stopping load balancers"
time="2024-09-16T02:33:17Z" level=info msg="[VIP] Releasing the Virtual IP [172.19.31.96]"
time="2024-09-16T02:33:17Z" level=info msg="release, lease: &{Offer:DHCPv4(xid=0x8d3899f6 hwaddr=00:00:6c:09:4c:d0 msg_type=OFFER, your_ip=172.19.31.96, server_ip=172.19.31.1) ACK:DHCPv4(xid=0x8d3899f6 hwaddr=00:00:6c:09:4c:d0 msg_type=ACK, your_ip=172.19.31.96, server_ip=172.19.31.1) CreationTime:2024-09-16 01:53:43.146825581 +0000 UTC m=+257861.920082388}"
time="2024-09-16T02:33:17Z" level=info msg="Removed [0e80e041-e874-4c9f-bc35-608754d63738] from manager, [1] advertised services remain"
time="2024-09-16T02:33:17Z" level=info msg="(svcs) [default/test-vm-ssh] has been deleted"
The load balancer IP address was cleared from the mgmt-br
interface:
harvester-vm-0-default:~ # ip a show mgmt-br
5: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 52:54:00:01:00:01 brd ff:ff:ff:ff:ff:ff
inet 172.19.31.224/24 brd 172.19.31.255 scope global mgmt-br
valid_lft forever preferred_lft forever
inet 172.19.31.240/32 scope global mgmt-br
valid_lft forever preferred_lft forever
Load balancer creation:
time="2024-09-16T02:55:38Z" level=info msg="(svcs) adding VIP [172.19.31.101] via mgmt-br for [default/test-vm-ssh]"
time="2024-09-16T02:55:38Z" level=info msg="[service] synchronised in 148ms"
time="2024-09-16T02:55:38Z" level=warning msg="(svcs) already found existing address [172.19.31.101] on adapter [mgmt-br]"
time="2024-09-16T02:55:41Z" level=warning msg="Re-applying the VIP configuration [172.19.31.101] to the interface [mgmt-br]"
The load balancer IP address was assigned to the mgmt-br
interface:
harvester-vm-0-default:~ # ip a show mgmt-br
5: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 52:54:00:01:00:01 brd ff:ff:ff:ff:ff:ff
inet 172.19.31.224/24 brd 172.19.31.255 scope global mgmt-br
valid_lft forever preferred_lft forever
inet 172.19.31.240/32 scope global mgmt-br
valid_lft forever preferred_lft forever
inet 172.19.31.101/32 scope global mgmt-br
valid_lft forever preferred_lft forever
Load balancer removal:
time="2024-09-16T02:57:35Z" level=info msg="[LOADBALANCER] Stopping load balancers"
time="2024-09-16T02:57:35Z" level=info msg="[VIP] Releasing the Virtual IP [172.19.31.101]"
time="2024-09-16T02:57:35Z" level=warning msg="could not remove iptables rules to limit traffic ports: could not delete common iptables rules: could not delete iptables rule to drop the traffic to VIP 172.19.31.101: running [/sbin/iptables-legacy -t filter -D INPUT -d 172.19.31.101 -m comment --comment default/test-vm-ssh kube-vip load balancer IP -j DROP --wait]: exit status 4: iptables: Resource temporarily unavailable.\n"
time="2024-09-16T02:57:35Z" level=info msg="Removed [3c453049-7b97-4d6f-89d1-182a07f494ec] from manager, [1] advertised services remain"
time="2024-09-16T02:57:35Z" level=info msg="(svcs) [default/test-vm-ssh] has been deleted"
The load balancer IP address was cleared from the mgmt-br
interface:
harvester-vm-0-default:~ # ip a show mgmt-br
5: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 52:54:00:01:00:01 brd ff:ff:ff:ff:ff:ff
inet 172.19.31.224/24 brd 172.19.31.255 scope global mgmt-br
valid_lft forever preferred_lft forever
inet 172.19.31.240/32 scope global mgmt-br
valid_lft forever preferred_lft forever
Close the issue as resolved.
Notice the warning message during the pool-type load balancer removal:
time="2024-09-16T02:57:35Z" level=warning msg="could not remove iptables rules to limit traffic ports: could not delete common iptables rules: could not delete iptables rule to drop the traffic to VIP 172.19.31.101: running [/sbin/iptables-legacy -t filter -D INPUT -d 172.19.31.101 -m comment --comment default/test-vm-ssh kube-vip load balancer IP -j DROP --wait]: exit status 4: iptables: Resource temporarily unavailable.\n"
This only happens when removing a pool-type load balancer; we don't observe the same issue when removing a DHCP-type load balancer.
The iptables rules residue on the system are found below:
harvester-vm-0-default:~ # iptables -nvL | grep test-vm-ssh
0 0 ACCEPT tcp -- * * 0.0.0.0/0 172.19.31.101 tcp dpt:22 /* default/test-vm-ssh kube-vip load balancer IP */
0 0 DROP all -- * * 0.0.0.0/0 172.19.31.101 /* default/test-vm-ssh kube-vip load balancer IP */
These two rules are relevant to the service security feature in kube-vip. Though it does not harm the cluster at first glance (since the IP address no longer exists in the system), we should monitor it and resolve this undesired behavior in the future.
Describe the bug
The issue was found when testing the common use cases of LoadBalancer creation and removal after we bumped the kube-vip to v0.8.0.
For example, given an appropriate IPPool object configured and a VM created, here we create a LoadBalancer associated with the VM for port 22. The intent is to access the VM via SSH with the LoadBalancer IP address.
With kube-vip v0.6.0, creating the above LoadBalancer object results the following logs in the
kube-vip
Pod:The allocated LoadBalancer IP address will be configured on the
mgmt-br
interface:Removing the LoadBalancer object results the following logs (in the
kube-vip
Pod):The LoadBalancer IP address is now cleared:
However, with kube-vip v0.8.0, creating the same LoadBalancer object results the following logs (in the
kube-vip
Pod):From the logs above it seems the controller processed the object two times as the warning messages suggests. But still the allocated LoadBalancer IP address is configured on the
mgmt-br
interface successfully:Removing the LoadBalancer object results the following logs (in the
kube-vip
Pod):From the log above, it seems only the Service object was deleted; no other actions were taken.
The previously allocated load balancer IP address is still on the network interface:
It's pretty clear that the iptables rules were not removed either:
This results in potential resource exhaustion (IP address is not released), presentation and inner-working inconsistency (the IPPool shows the IP address is available, which is not), and introduced security issues (users will now be able to access the node, say, via SSH, with the unreleased IP address).
To Reproduce
See above.
Expected behavior
The allocated LoadBalancer IP addresses should be correctly released by kube-vip after the LoadBalancer objects were removed.
Support bundle
Support Bundle 2024-04-25.zip
Environment
Additional context
The kube-vip version was bumped in #5635
The
kube-vip
DaemonSet looks like the following (note: Harvester v1.3.0 comes with kube-vip v0.6.0 and its chart 0.4.2 as a dependency by default. Here I only updated the image tag to v0.8.0 for easy reproduction):