editing ingress always tries to delete specified security group and fails (time out)

ezafeire commented 10 months ago

Describe the bug Whenever I edit my ingress, it tries to delete the security group that I've told it to add to the load balancer.

Steps to reproduce Below is my ingress class

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: preprod-ingress
  namespace: default
  labels:
    app: preprod-zuul-server-ingress
  annotations:
    alb.ingress.kubernetes.io/group.name: xxxx-preprod-ingress
    alb.ingress.kubernetes.io/group.order: '1000'
    alb.ingress.kubernetes.io/load-balancer-name: public-alb-cdn
    alb.ingress.kubernetes.io/manage-backend-security-group-rules: 'false'
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/security-groups: public-alb-cdn-sg
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/tags: Environment=preprod
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":443}]'
    alb.ingress.kubernetes.io/load-balancer-attributes: deletion_protection.enabled=true
spec:
  ingressClassName: alb
  rules:
    - host: yyyy.xxxx.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: zuul-server
                port:
                  number: 9000

Using terraform, I have already created the public alb that it's referring to: public-alb-cdn as well as the security group: public-alb-cdn-sg

Whenever I edit that ingress (host change/port change/whatever), it does reconcile but also tries to delete the security group and times out.

{"level":"error","ts":1703088441.5640392,"logger":"controller-runtime.manager.controller.ingress","msg":"Reconciler error","name":"xxxx-preprod-ingress","namespace":"","error":"failed to delete securityGroup: timed out waiting for the condition"}

Expected outcome

It should not be trying to delete the security group at all,since the name isn't changing.

Environment EKS 1.27

AWS Load Balancer controller version: v2.4.4
Kubernetes version: EKS 1.27
Using EKS (yes/no), if so version? Yes 1.27

Additional Context:

oliviassss commented 9 months ago

@ezafeire, hi is the sg public-alb-cdn-sg created by the controller or outside of the controller? Does it have tags like

    elbv2.k8s.aws/cluster: ${clusterName}
    ingress.k8s.aws/stack: ${stackID}
    ingress.k8s.aws/resource: ${resourceID}

I noticed you're using ingress group, how about other ingresses under the same group, are they also pointing to this sg?

ezafeire commented 9 months ago

@ezafeire, hi is the sg public-alb-cdn-sg created by the controller or outside of the controller? Does it have tags like
    elbv2.k8s.aws/cluster: ${clusterName}
    ingress.k8s.aws/stack: ${stackID}
    ingress.k8s.aws/resource: ${resourceID}
I noticed you're using ingress group, how about other ingresses under the same group, are they also pointing to this sg?

Hi, it does have those tags. Would removing them stop its attempt at trying to delete it? The security group is managed through terraform (the only reason why we do this is cause we couldnt figure out how to specify a cloudfront prefix list through annotations in the ingress). Yes, there's another ingress under the same group also pointing to this sg.

Thanks, I really appreciate your help :)

geocomm-shenningsgard commented 9 months ago

I'm also hitting this issue after clean install on a new 1.28 EKS cluster with aws-load-balancer-controller v2.6.2. In my case, I didn't even specify a security group in my configuration. There is only a single Ingress in this cluster.:

metadata:
  annotations:
    alb.ingress.kubernetes.io/group.name: my-cool-group-name
    alb.ingress.kubernetes.io/healthcheck-path: /health
    alb.ingress.kubernetes.io/load-balancer-name: my-cool-load-balancer
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/success-codes: "200"
    alb.ingress.kubernetes.io/tags: product=io, style=chunky
    alb.ingress.kubernetes.io/target-type: instance

...

  finalizers:
  - group.ingress.k8s.aws/my-cool-group-name

...

Note - as far as I can see, the Load Balancer, Target Groups, and Security Group have all successfully been deleted (assuming a Security Group was originally created - I didn't check), but this is preventing the Ingress resource from being deleted.

{"level":"info","ts":"2024-01-25T14:12:36Z","logger":"controllers.ingress","msg":"successfully built model","model":"{\"id\":\"my-cool-group-name\",\"resources\":{}}"}
{"level":"info","ts":"2024-01-25T14:12:37Z","logger":"controllers.ingress","msg":"successfully deployed model","ingressGroup":"my-cool-group-name"}
{"level":"error","ts":"2024-01-25T14:14:37Z","msg":"Reconciler error","controller":"ingress","object":{"name":"my-cool-group-name"},"namespace":"","name":"my-cool-group-name","reconcileID":"78046cae-c848-46a9-8990-f2cfd81c3f83","error":"failed to delete securityGroup: timed out waiting for the condition"}
{"level":"info","ts":"2024-01-25T14:31:17Z","logger":"controllers.ingress","msg":"successfully built model","model":"{\"id\":\"my-cool-group-name\",\"resources\":{}}"}
{"level":"info","ts":"2024-01-25T14:31:18Z","logger":"controllers.ingress","msg":"successfully deployed model","ingressGroup":"my-cool-group-name"}
{"level":"error","ts":"2024-01-25T14:33:19Z","msg":"Reconciler error","controller":"ingress","object":{"name":"my-cool-group-name"},"namespace":"","name":"my-cool-group-name","reconcileID":"a97ada5a-1539-4f7f-a76d-af58fa1f9ff0","error":"failed to delete securityGroup: timed out waiting for the condition"}

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 4 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/3516#issuecomment-2185045877): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

kaykhan commented 3 weeks ago

Did anyone find the cause of this issue?

kubernetes-sigs / aws-load-balancer-controller

editing ingress always tries to delete specified security group and fails (time out) #3516