Closed irfanurrehman closed 6 years ago
Comment by nikhiljindal Monday Sep 11, 2017 at 23:44 GMT
cc @marun (current on call) This is one of the reason why some our e2e tests are failing. We have been leaking the resources for a long time. We ran out of quota now and hence the tests started failing.
As a short term fix, @madhusudancs is going to update his "clean leaked resources" script to clean health checks and firewall rules so that our tests dont run out of quota, but we need a proper fix for this.
cc @kubernetes/sig-federation-bugs
Comment by madhusudancs Wednesday Sep 13, 2017 at 22:07 GMT
PR https://github.com/kubernetes/test-infra/pull/4545 is the short-term fix.
Comment by nikhiljindal Tuesday Oct 10, 2017 at 17:16 GMT
@kubernetes/sig-multicluster-feature-requests /sig multicluster
Comment by walteraa Tuesday Oct 10, 2017 at 17:27 GMT
@nikhiljindal The same is happening to DNS entries. I think when a federated service is deleted, all entries referring to this service should be deleted as well. What do you think?
Comment by nikhiljindal Tuesday Oct 10, 2017 at 18:49 GMT
@walteraa Please feel free to file an issue for that. Please include steps to repro if you are able to repro it consistently.
Comment by fejta-bot Thursday Jan 11, 2018 at 17:36 GMT
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Prevent issues from auto-closing with an /lifecycle frozen
comment.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or @fejta
.
/lifecycle stale
Comment by kinghrothgar Thursday Jan 11, 2018 at 20:38 GMT
I believe we maybe running into this. When we spin up a federated ingress-gce, one of the 4 regions randomly ends up with in an extra unused backend:
One region also always randomly gets set to CPU Utilization instead of Rate:
Each time it is a different region. We are running 1.8.5-gke.0.
Comment by nikhiljindal Friday Jan 12, 2018 at 20:00 GMT
fwiw, you can now try out kubemci, a command line tool to setup multi cluster ingresses: https://github.com/GoogleCloudPlatform/k8s-multicluster-ingress.
cc @nikhiljindal
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten /remove-lifecycle stale
Issue by nikhiljindal Monday Sep 11, 2017 at 23:34 GMT Originally opened as https://github.com/kubernetes/kubernetes/issues/52315
Steps to repro: Create a federated service of type LoadBalancer and a federated ingress and then delete them. Expected: All GCP resources (health checks, firewall rules, instance groups, backend service, etc) should be deleted when service and ingress are deleted. Actual: GCP Health check and firewall rules are leaked sometimes.
Explanation: In kubernetes release 1.7, we updated the service controller to create health check and firewall rules whose names are generated using providerID/clusterID (providerID if it exists, else clusterID) and since providerID is set by federated ingress controller, service controller looks for a different name if ingress controller sets it after service controller created GCP resources. This race condition between the 2 controller leads to service controller leaking the original health check and firewall rule that it had created.
Possible fixes: