Closed tonybolzan closed 1 year ago
Update: Even switching to OnDemand, the Load Balancer was recreated, leaving 2 LB with the same name, one unconfigured and the other configured.
The log on a real event
I0905 19:08:25.473783 1 routingpolicy.go:105] "Finished syncing routing policies for ingress class" ingressClass="test-ingress-class" duration="709.868µs
I0905 19:08:35.473190 1 routingpolicy.go:103] "Started syncing routing policies for ingress class" ingressClass="test-ingress-class" startTime="2023-09-05 19:08:35.473166461 +0000 UTC m=+275713.251299440
I0905 19:08:35.473287 1 util.go:97] Listener paths for routing policy: {...big json...}
I0905 19:08:35.473325 1 loadbalancer.go:143] Refreshing LB cache for lb ocid1.loadbalancer.oc1.sa-saopaulo-1.aaaaaaaapy5axf6sofjsgis3fjxdtda6caedrep526vrar6wknjsqszyteua
I0905 19:08:42.825880 1 backend.go:113] "Finished syncing backends for ingress class" ingressClass="test-ingress-class" duration="17.352789779s"
I0905 19:08:42.826243 1 backend.go:486] Error syncing backends for ingress class test-ingress-class: unable to fetch backendset health: Error returned by LoadBalancer Service.
Http Status Code: 404.
Error Code: NotAuthorizedOrNotFound.
Opc request id: d6c135b589e008048cd49474bea5d0df/B96EBC8AAD6A1B7A6F9AAB51573F7F01/A2704054B20A333C54A2E86B074E86E5.
Message: Authorization failed or requested resource not found.
Operation Name: GetBackendSetHealth
Timestamp: 2023-09-05 19:08:25 +0000 GMT
Client Version: Oracle-GoSDK/65.34.0
Request Endpoint: GET https://iaas.sa-saopaulo-1.oraclecloud.com/20170115/loadBalancers/ocid1.loadbalancer.oc1.sa-saopaulo-1.aaaaaaaapy5axf6sofjsgis3fjxdtda6caedrep526vrar6wknjsqszyteua/backendSets/bs_a784ef83523e5f6/health
Troubleshooting Tips: See https://docs.oracle.com/iaas/Content/API/References/apierrors.htm#apierrors_404__404_notauthorizedornotfound for more information about resolving this error.
Also see https://docs.oracle.com/iaas/api/#/en/loadbalancer/20170115/BackendSetHealth/GetBackendSetHealth for details on this operation's requirements.
To get more info on the failing request, you can set OCI_GO_SDK_DEBUG env var to info or higher level to log the request/response details.
If you are unable to resolve this LoadBalancer issue, please contact Oracle support and provide them this full error message.
I0905 19:08:42.826271 1 backend.go:111] "Started syncing backends for ingress class" ingressClass="test-ingress-class" startTime="2023-09-05 19:08:42.826262535 +0000 UTC m=+275720.604395515"
I0905 19:08:42.826314 1 loadbalancer.go:143] Refreshing LB cache for lb ocid1.loadbalancer.oc1.sa-saopaulo-1.aaaaaaaapy5axf6sofjsgis3fjxdtda6caedrep526vrar6wknjsqszyteua
I0905 19:08:50.724727 1 loadbalancer.go:143] Refreshing LB cache for lb ocid1.loadbalancer.oc1.sa-saopaulo-1.aaaaaaaapy5axf6sofjsgis3fjxdtda6caedrep526vrar6wknjsqszyteua
I0905 19:09:00.114667 1 webhook.go:59] "processing pod creation for pod readiness" pod="piperun/"
I0905 19:09:00.983975 1 reflector.go:559] /workspace/main.go:139: Watch close - *v1.Service total 10 items received
I0905 19:09:09.489571 1 reflector.go:281] /workspace/main.go:133: forcing resync
I0905 19:09:09.489974 1 ingressclass.go:108] "Updating ingress class" ingressClass="test-ingress-class"
I0905 19:09:09.489997 1 ingressclass.go:159] "Started syncing ingress class" ingressClass="test-ingress-class" startTime="2023-09-05 19:09:09.489987392 +0000 UTC m=+275747.268120372"
I0905 19:09:09.490037 1 loadbalancer.go:143] Refreshing LB cache for lb ocid1.loadbalancer.oc1.sa-saopaulo-1.aaaaaaaapy5axf6sofjsgis3fjxdtda6caedrep526vrar6wknjsqszyteua
I0905 19:09:09.513111 1 ingressclass.go:235] "Creating load balancer for ingress class" ingressClass="test-ingress-class"
I0905 19:09:09.513208 1 ingressclass.go:270] Create lb request: {...big json...}
Hi @tonybolzan backendset not found is transient error, it should go away once LB gets created and work requests are successful. For the bugged LB, which is the Loadbalancer IP(bugged lb or configured LB) and id updated to the Ingress? Can you confirm on that.
same for here
These are the logs at the exact moment native-ingress-controller
created a new load balancer when it shouldn't have.
The 404 of backendset can be, or cannot be related to the Bug of a new LB creation.
At this moment, the Native Ingress Load Balancer has been removed to create a new one using Nginx Ingress.
I tried to keep Native Ingress and Nginx Ingress together but the Deployment Rollout are getting stuck, and any new deployment are not completed successfully. Problably related to Namespace label podreadiness.ingress.oraclecloud.com/pod-readiness-gate-inject
I filled a SR 3-34189913991
in de MOS with more information about the problem.
Looking the logs and the code:
ingressclass.go::ensureLoadBalancer()
are called and then call c.getLoadBalancer(ic)
that returned lb == nil
nil
are from loadbalancer.go::getLoadBalancer()
call lbc.getLoadBalancerBustCache(ctx, lbID)
loadbalancer.go::getLoadBalancerBustCache()
every http error are treated the same, returning nil
This behavior means that a temporary unavailability or even a faulty request forces the creation of a new Load Balancer. Differentiating errors and treating them in a non-generic way should resolve the situation. Raising the tool's reliability level with retrys and exponential backoff should help in a network failure.
I have a question about this. When the new load balancer is created, is there any way to assign the same IP as the previous one?
Will be taken care in upcoming release.
In an unusual sequence of events that oracle reclaim the preemptible nodes multiple times in a day, this Ingress bugged and created a new Load Balancer. If IP are dynamic the DNS going to send traffic to wrong IP, if IP are reserved the new LB are not to be created because of conflict.