Closed irishgordo closed 2 weeks ago
Error from events
SyncLoadBalancerFailed Service anothernginx Error syncing load balancer: failed to ensure load balancer: update load balancer IP of service default/anothernginx failed, error: Operation cannot be fulfilled on services "anothernginx": the object has been modified; please apply your changes to the latest version and try again anothernginx.17b82a1ac0e6db80 Wed, Feb 28 2024 3:19:12 pm
For Contrast, this is not reproducible on:
cc: @khushboo-rancher @bk201 @starbops
Additionally for contrast, not an issue in:
Since harvester-cloud-provider
0.2.0, we introduced kube-vip
as a dependency. Without kube-vip
running on the guest cluster, the underlying DHCP negotiation will not happen, and the LB type of Serivce will be stuck in the pending
state.
$ kubectl get ds kube-vip -n kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-vip 0 0 0 0 0 node-role.kubernetes.io/control-plane=true 137m
The kube-vip
DaemonSet is there, but no Pods are scheduled due to the misconfigured nodeSelector
. The Node object has the label named node-role.kubernetes.io/controlplane: "true"
:
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
cattle.io/creator: norman
kubernetes.io/arch: amd64
kubernetes.io/hostname: v126-opensuse1
kubernetes.io/os: linux
node-role.kubernetes.io/controlplane: "true"
node-role.kubernetes.io/etcd: "true"
node-role.kubernetes.io/worker: "true"
To work around the issue, we need to manually patch the kube-vip
DaemonSet by removing the -
character for the nodeSelector
to match the node's label. After that, the kube-vip
Pods will be running on the cluster.
Note: This only happens to RKE1 guest clusters because the nodes' label is node-role.kubernetes.io/control-plane: "true"
, which could satisfy kube-vip
's nodeSelector
.
For users who want to install harvester-cloud-provider
chart on the RKE1 cluster, you can provide the following values to unset the default nodeSelector
and add a new one with the correct key:
# before
kube-vip:
nodeSelector:
node-role.kubernetes.io/control-plane: "true"
# after
kube-vip:
nodeSelector:
node-role.kubernetes.io/control-plane: null
node-role.kubernetes.io/controlplane: "true"
FYI, update the chart history:
Harvester-cloud-provider has 2 main releases: 0.1.14 and 0.2.3
.
The old 0.1.14
is working with Harvester v1.2.1, but has no latest features & bug fixes.
The v0.2.3
is a minor fix to v0.2.2
, its the target chart for Harvester v1.2.1, and the usage is docuement in: https://docs.harvesterhci.io/v1.2/rancher/cloud-provider
Harvester-cloud-provider releases:
https://github.com/harvester/charts/releases/tag/harvester-cloud-provider-0.1.14
github-actions released this Jan 11, 2023
https://github.com/harvester/charts/releases/tag/harvester-cloud-provider-0.2.2
github-actions released this Jun 16, 2023
https://github.com/harvester/charts/releases/tag/harvester-cloud-provider-0.2.3
github-actions released this Jan 15, 2024
Harvester v1.2.1 release:
https://github.com/harvester/harvester/releases/tag/v1.2.1
rancherio-gh-m released this Oct 26, 2023
Rancher-v2.7 chart index:
https://github.com/rancher/charts/blob/dev-v2.7/index.yaml
harvester-cloud-provider:
catalog.cattle.io/upstream-version: 0.1.14
apiVersion: v2
appVersion: v0.1.5
created: "2023-05-17T18:41:41.990313+08:00"
...
catalog.cattle.io/upstream-version: 0.2.3
apiVersion: v2
appVersion: v0.2.0
created: "2024-01-19T14:56:50.015207836+01:00"
...
[x] ~~If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted? The HEP PR is at:~~
[x] Where is the reproduce steps/test steps documented? The reproduce steps/test steps are at: https://github.com/harvester/harvester/issues/5247#issue-2160061114
[x] Is there a workaround for the issue? If so, where is it documented? The workaround is at: https://github.com/harvester/harvester/issues/5247#issuecomment-1970596335
[x] Have the backend code been merged (harvester, harvester-installer, etc) (including backport-needed/*
)?
The PR is at: harvester/charts#229
[x] Does the PR include the explanation for the fix or the feature?
[x] ~~Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart? The PR for the YAML change is at: The PR for the chart change is at:~~
[x] ~~If labeled: area/ui Has the UI issue filed or ready to be merged? The UI issue/PR is at:~~
[x] ~~If labeled: require/doc, require/knowledge-base Has the necessary document PR submitted or merged? The documentation/KB PR is at:~~
[x] If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?
[x] ~~If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility
?
The compatibility issue is filed at:~~
This is validated with Cloud provider 0.2.4 and rke1 1.27 as working.
Describe the bug Harvester Cloud Provider w/ Rancher v2.7.11-rc3, when utilized in an RKE/RKE1 Cluster of N nodes in Harvester v1.2.1 fails to have in Rancher v2.7.11-rc3, the LoadBalancer move from a "Pending" state. Additionally, a 'taint' is on the node initially ( even in a cluster with 1 worker node & 1 etcd,cp,worker node ( it will be present on both ) ) - once removed,
rancher-webhook
deployment will succeed.Tested Rancher with:
Pre-Req For Reproduce
Example Helm Based:
Example Docker based:
To Reproduce Steps to reproduce the behavior:
overlay2
rancher-webhook
under deploys to finishnginx:latest
with the tags of something like:service: nginx
on the podExpected behavior Load Balancer service provided through Harvester Cloud Provider to succeed in Rancher and not be hung
Support bundle supportbundle_c5f6a4f6-7dd0-4c79-88eb-6fef88a6cb09_2024-02-28T23-34-09Z.zip
Environment
Additional context![Screenshot from 2024-02-28 15-35-39](https://github.com/harvester/harvester/assets/5370752/99597936-6670-493f-9f02-9c861b7ecabd)
Errors Noticed:
@noahgildersleeve @khushboo-rancher @TachunLin