Open Lirt opened 1 year ago
@Lirt I thought the 15K ip assignments per day was due to bug #380. Are you still doing that many assignments after the fix for #380 was installed?
Hmmm, it's hard to tell which one caused the IP assignment DoS. But the reason why service is in pending
forever is this one in our case (
You can eventually check the counters again in one day (or check what is the rate right now if it helps).
Thanks @Lirt . We've done some checking and validated that the actual cause of the error was on our API's side. No fixes to CPEM resolved it and you're not currently causing any additional assignments right now.
I appreciate you're trying to leverage LoadBalancerClass's to avoid ever accidentally triggering this again, but this particular issue can't actually be stopped with this method. It was truly on the Equinix metal API side of things.
What we CAN do is implement better rate limiting and error handling, and that's something we've targeted to do for CPEM, but I don't have a timeframe for when it would be done.
If you're still interested in using LoadBalancerClass, we can continue to look at how to make CPEM interact with them better and not run into this issue.
Thank you for help.
This is not that important for us as long as it's not causing you internal troubles. My impression was that this is causing high amount of ip assignment requests, but if not, then it's good.
So right now only thing that is "off" is cosmetic issue - external IP of Service in <pending>
state.
cloud-provider-equinix-metal-kubernetes-external LoadBalancer 172.26.85.165 <pending> 443:32557/TCP 49d
Understood. Even if it's just a cosmetic issue, knowing that you're going to continue using LoadBalancerClass helps us prioritize this versus other issues when we consider what to fix next. Thank you.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
/reopen
@cprivitere: Reopened this issue.
/triage accepted
Hello,
This is rather complicated issue but I'll try to explain it in simplest way.
I have standard CPEM LoadBalancer provisioned by CPEM:
I use MetalLB to provision additional LoadBalancer services - currently just one
ingress-nginx-caas-controller
for test case.I have issue that MetalLB is watching service
cloud-provider-equinix-metal-kubernetes-external
by default and it fights for updates on this service with CPEM. We see this very easily, because as soon as I start MetalLB controller thecloud-provider-equinix-metal-kubernetes-external
service changes to this (see<pending>
):This is service description including last events to see that metallb is actually doing changes to this svc:
EQX support told us we do 15k IP assignments per day. It's most likely caused by situation describe above.
So I wanted to use new feature of MetalLB (0.13) to set loadBalancerClass that MetalLB will be watching - https://github.com/metallb/metallb/blob/77923bc823294f2f31e68193901efa3b30faea59/controller/main.go. Simply define
--lb-class my-lb-class
.MetalLB stops updating
cloud-provider-equinix-metal-kubernetes-external
as expected. This is good.But then what happens is that CPEM doesn't see events on service with
loadBalancerClass
. Meaning when I create or delete service that containsloadBalancerClass
, nothing happens in CPEM.After long troubleshooting I found out that this behavior is defined in ServiceController that CPEM uses and is expected to happen - please see this code.
Now :smile: seeing that those 2 controllers don't work well together my question is do you have recommended way how to make this setup to work correctly without DoS-ing your API or point me to where I do a mistake if I do any.
I understand that this part of the code is very unlikely to be changed. If MetalLB decided to just use annotation to ignore service it would be all good :smiley: but they actually used attribute that is ignored by cloudprovider library.
Issues is easy to replicate - here is example of service I create (this service will be unnoticed by CPEM):
Note: Tested with latest main (https://github.com/equinix/cloud-provider-equinix-metal/pull/386). I think this issue was present also before and is not related to recent changes.