K8S delete of service loadbalancer fails to clean up and exits prematurely

chapmanc commented 4 months ago

What happened:

K8S service deletion proceeds without cleaning up the load balancer.

What you expected to happen:

The service should clean up all the dependent resources. Instead we have a warning in the events that if failed to delete and then it gives up and deletes the service. The Etag doesn't seem to match but it doesn't update or try to update itself. This results in the load balancer being orphaned and left behind.

error syncing load balancer: failed to delete load balancer: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 412, RawError: {
    "error": {
        "code": "PreconditionFailed",
        "message": "Precondition failed.",
        "details": [
            {
            "code": "PreconditionFailedEtagMismatch",
            "message": "Etag provided in if-match header W/\"9ec66ae2-c704-4c45-bf93-6fb73345dfb1\" does not match etag W/\"c9dd49fc-1247-46dd-8cff-14af6bd1b343\" of resource /subscriptions/<redacted>/resourceGroups/MC_<redacted>/providers/Microsoft.Network/loadBalancers/kubernetes-internal in NRP data store."
            }
        ]
    } 
}

How to reproduce it (as minimally and precisely as possible):

Created k8s service loadbalancer
Added some backend pools manually
Removed backend pools manually
Triggered a delete of the service in k8s
Service is deleted but the load balancer still exists

Anything else we need to know?:

We have a controller that adds some things to the lb and then deletes them before the service is deleted. We have verified that all modifications it makes are removed and the manager is no longer running by the time it gets to the delete.

Environment:

Kubernetes version (use kubectl version): Server Version: v1.28.9
Cloud provider or hardware configuration: AKS
OS (e.g: cat /etc/os-release): n/a
Kernel (e.g. uname -a): n/a
Install tools: n/a
Network plugin and version (if this is a network-related bug): n/a
Others: Similar error to the below but it never updates the load balancer and it fails to delete it. https://github.com/kubernetes-sigs/cloud-provider-azure/issues/792

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 weeks ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

nilo19 commented 2 weeks ago

@chapmanc This is expected as you manually change the load balancer. All managed resources including the lb should not be touched.

kubernetes-sigs / cloud-provider-azure