Open dimepues opened 7 months ago
This issue, it quite possiblly similar to https://github.com/harvester/harvester/issues/5072#issuecomment-1920653841
the ipam.NewAllocator
does not finish the initialization on time, and set a status to indicate that it can start to serve new allocations
func (h *Handler) OnChange(_ string, ipPool *lbv1.IPPool) (*lbv1.IPPool, error) {
previousAllocator := h.allocatorMap.Get(ipPool.Name)
if previousAllocator == nil || previousAllocator.CheckSum() != ipam.CalculateCheckSum(ipPool.Spec.Ranges) {
a, err := ipam.NewAllocator(ipPool.Name, ipPool.Spec.Ranges, h.ipPoolCache, h.ipPoolClient)
add the debug informaton:
duplicated IP allocaton:
LB:
apiVersion: [loadbalancer.harvesterhci.io/v1beta1](http://loadbalancer.harvesterhci.io/v1beta1)
kind: LoadBalancer
metadata:
annotations:
[cloudprovider.harvesterhci.io/service-uuid](http://cloudprovider.harvesterhci.io/service-uuid): 305b4f79-ceff-4fc1-be08-17c740cd24f9
[loadbalancer.harvesterhci.io/namespace](http://loadbalancer.harvesterhci.io/namespace): default
[loadbalancer.harvesterhci.io/network](http://loadbalancer.harvesterhci.io/network): ''
[loadbalancer.harvesterhci.io/project](http://loadbalancer.harvesterhci.io/project): c-m-kb9nwxh2/p-kfl9f
creationTimestamp: '2024-01-24T19:18:25Z'
finalizers:
- [wrangler.cattle.io/harvester-lb-controller](http://wrangler.cattle.io/harvester-lb-controller)
generation: 10
labels:
[cloudprovider.harvesterhci.io/cluster](http://cloudprovider.harvesterhci.io/cluster): dev
name: dev-argocd-lb-09a33510
namespace: default
resourceVersion: '9013702'
uid: 0c3e2048-cbd5-4381-997b-189839651835
spec:
backendServerSelector:
[harvesterhci.io/vmName](http://harvesterhci.io/vmName):
- dev-pool1-62d36532-2mchw
- dev-pool1-62d36532-djjzc
- dev-pool1-62d36532-wcwhq
ipam: pool
listeners:
- backendPort: 30657
name: http
port: 80
protocol: TCP
- backendPort: 32215
name: https
port: 443
protocol: TCP
status:
backendServers:
- 192.168.112.21
- 192.168.112.22
- 192.168.112.20
conditions:
- lastUpdateTime: '2024-01-24T19:18:35Z'
message: >-
allocate ip for lb default/dev-argocd-lb-09a33510 failed, error:
192.168.112.9 has been allocated to default/dev-argocd-lb-09a33510,
duplicate allocation is not allowed
status: 'False'
type: Ready
IPPool:
apiVersion: [loadbalancer.harvesterhci.io/v1beta1](http://loadbalancer.harvesterhci.io/v1beta1)
kind: IPPool
metadata:
creationTimestamp: '2024-01-24T17:00:14Z'
finalizers:
- [wrangler.cattle.io/harvester-ipam-controller](http://wrangler.cattle.io/harvester-ipam-controller)
generation: 34
labels:
[loadbalancer.harvesterhci.io/global-ip-pool](http://loadbalancer.harvesterhci.io/global-ip-pool): 'true'
[loadbalancer.harvesterhci.io/vid](http://loadbalancer.harvesterhci.io/vid): '112'
managedFields:
- apiVersion: [loadbalancer.harvesterhci.io/v1beta1](http://loadbalancer.harvesterhci.io/v1beta1)
fieldsType: FieldsV1
fieldsV1:
f:spec:
.: {}
f:ranges: {}
f:selector:
.: {}
f:network: {}
f:scope: {}
manager: harvester
operation: Update
time: '2024-01-24T19:29:09Z'
- apiVersion: [loadbalancer.harvesterhci.io/v1beta1](http://loadbalancer.harvesterhci.io/v1beta1)
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.: {}
v:"[wrangler.cattle.io/harvester-ipam-controller](http://wrangler.cattle.io/harvester-ipam-controller)": {}
f:status:
.: {}
f:allocated:
.: {}
f:192.168.112.2: {}
f:192.168.112.3: {}
f:192.168.112.4: {}
f:192.168.112.5: {}
f:192.168.112.6: {}
f:192.168.112.7: {}
f:192.168.112.8: {}
f:192.168.112.9: {}
f:available: {}
f:conditions: {}
f:lastAllocated: {}
f:total: {}
manager: harvester-load-balancer
operation: Update
time: '2024-01-24T19:29:09Z'
name: global-ip-pool
resourceVersion: '9013591'
uid: f1cbd5ca-8dcd-4260-992d-cc24f58d276e
spec:
ranges:
- gateway: 192.168.112.1
rangeEnd: 192.168.112.9
rangeStart: 192.168.112.2
subnet: 192.168.112.0/24
selector:
network: default/k8s
scope:
- guestCluster: '*'
namespace: '*'
project: '*'
status:
allocated:
192.168.112.2: default/dev-argocd-lb-81963e40
192.168.112.3: default/dev-argocd-lb-98940262
192.168.112.4: default/dev-argocd-lb-f6583253
192.168.112.5: default/dev-argocd-lb-55e4ea27
192.168.112.6: default/dev-argocd-lb-43fe840d
192.168.112.7: default/dev-argocd-lb-b5a5dc14
192.168.112.8: default/dev-argocd-lb-18edf10b
192.168.112.9: default/dev-argocd-lb-09a33510
available: 0
conditions:
- lastUpdateTime: '2024-01-24T17:00:14Z'
status: 'True'
type: Ready
lastAllocated: 192.168.112.9
total: 8
The IPPool object has following situation:
t has allocated
, but the AllocatedHistory
is empty, it caused this part of code fails to reuse the already allocated IP:
status:
allocated:
192.168.112.2: default/dev-argocd-lb-81963e40
192.168.112.3: default/dev-argocd-lb-98940262
192.168.112.4: default/dev-argocd-lb-f6583253
192.168.112.5: default/dev-argocd-lb-55e4ea27
192.168.112.6: default/dev-argocd-lb-43fe840d
192.168.112.7: default/dev-argocd-lb-b5a5dc14
192.168.112.8: default/dev-argocd-lb-18edf10b
192.168.112.9: default/dev-argocd-lb-09a33510
available: 0
type IPPoolStatus struct {
Total int64 `json:"total"`
Available int64 `json:"available"`
LastAllocated string `json:"lastAllocated"`
// +optional
Allocated map[string]string `json:"allocated,omitempty"`
// +optional
AllocatedHistory map[string]string `json:"allocatedHistory,omitempty"`
// +optional
Conditions []Condition `json:"conditions,omitempty"`
}
@dimepues I guess your cluster was rebooted at about 2024.01.26; but the LB and IP pool posted above was happened at 2024.01.24; the support-bundle did not include corresponding information.
When you can reproduce this bug, please help reproduce it and then generate a new support-bundle file. thanks. I have some clues, and need the supportbundle file to double check.
If your workloads are deployed in none-default namespace, please remember to add them per https://docs.harvesterhci.io/v1.2/advanced/index#support-bundle-namespaces
For this error
message: >-
allocate ip for lb default/dev-argocd-lb-09a33510 failed, error:
192.168.112.9 has been allocated to default/dev-argocd-lb-09a33510,
duplicate allocation is not allowed
I get the root cause, and will submit a PR to fix.
@w13915984028 - wow thanks so much. Apologies for the delay in responding as I was away.
The current embedded *allocator.IPAllocator
does not seem to a good solution for loadbalancer now:
https://github.com/w13915984028/load-balancer-harvester/blob/712f152677cf3a224f9ce5345639c02b85526554/pkg/ipam/allocator.go#L24
(1) It is a single-way iteration to allocate IP, for the released IP, it seems can't re-use (2) There is no good way to initialize when some IPs have already been allocated, a potential deadloop is there.
We are looking for a better solution.
[ ] If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted? The HEP PR is at:
[ ] Where is the reproduce steps/test steps documented? The reproduce steps/test steps are at:
[ ] Is there a workaround for the issue? If so, where is it documented? The workaround is at:
[x] Have the backend code been merged (harvester, harvester-installer, etc) (including backport-needed/*
)?
The PR is at: https://github.com/harvester/load-balancer-harvester/pull/31 The IP pool & LB management are massively improved in this PR.
[x] Does the PR include the explanation for the fix or the feature? https://github.com/harvester/load-balancer-harvester/pull/31
[ ] Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart? The PR for the YAML change is at: The PR for the chart change is at:
[ ] If labeled: area/ui Has the UI issue filed or ready to be merged? The UI issue/PR is at:
[ ] If labeled: require/doc, require/knowledge-base Has the necessary document PR submitted or merged? The documentation/KB PR is at:
[ ] If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?
[ ] If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility
?
The compatibility issue is filed at:
Automation e2e test issue: harvester/tests#1382
This issue was caused by a bug: the IPPool allocation can assume and refuse to allocate IP saying duplicate allocation is not allowed
accidently.
Test plan (Harvester ISO should be built after 2024.08.30):
The LB can be created without the requirement of the existing of any VM, just use a selector which points to something. Thus the LB and IPPool can be tested separately.
(1) Create an IPPool with a range of IPs, e.g. 10 IPs
(2) Create &/ Delete LBs (allocate IP from the above pool) in batch and quickly; the LB should either get an IP, or show error that no IP is available; but have no similar issue that duplicate allocation is not allowed
.
Describe the bug Experiencing failures when deploying load balancers in guest clusters and directly within the Harvester cluster. Despite creating appropriate IP pools, load balancers fail to assign IPs or encounter timeout errors.
To Reproduce
Expected behavior Load balancers should successfully deploy, acquire an IP from the pool, and operate without errors.
Troubleshooting Steps Taken
Errors Encountered
Support bundle supportbundle_d8c90104-75c7-4adb-bfef-14f4f49f1c00_2024-01-26T03-43-46Z.zip
Environment
Additional context