CertificateNotFound Error (loadbalancer still trying to use a previous certificate)

alexistimic commented 1 year ago

Describe the bug

Ingress cannot be deployed (stuck at CertificateNotFound) because it is trying to use a previous (expired) certificate. This certificate does not exist anymore as I've deleted it (ACM and directly to the loadbalancer) and I have attached the new certificate to the ALB. I have changed the ingress annotion to the new one using: alb.ingress.kubernetes.io/certificate-arn: {{ .Values.SSLCertARN }}

For some reasons, it is still trying to use the previous expired certificate.

{"level":"info","ts":1667833853.926948,"logger":"controllers.ingress","msg":"modifying listener","stackID":"dev","resourceID":"443","arn":"arn:aws:elasticloadbalancing:eu-west-1:<ACCOUNT_ID>:listener/app/k8s-dev-....."}
{"level":"error","ts":1667833853.9837239,"logger":"controller-runtime.manager.controller.ingress","msg":"Reconciler error","name":"dev","namespace":"","error":"CertificateNotFound: Certificate 'arn:aws:acm:eu-west-1:<ACCOUNT_ID>:certificate/<PREVIOUS_EXPIRED_CERTIFICATE>' not found\n\tstatus code: 400, request id: 9edc742f-8c66-46c6-8c1d-393c3fb116b6"}

I have tried to delete the aws-loadbalancer-controller pods but no changes. I have also update to v2.4.4.

Environment

AWS Load Balancer controller version Happened on v2.4.1 and still present on v2.4.4
Kubernetes version v1.23

awsloadbalancer.pdf

kishorj commented 1 year ago

@alexistimic, do you have multiple ingresses grouped together in an ingress-group? Would you be able to share the model generated by the controller for the error case - look for log lines that say "successfully built model ..." for your ingress?

alexistimic commented 1 year ago

Hi @kishorj

Yes we have several ingresses, grouped together to a single ingress-group (dev)

{
  "level":"info",
  "ts":1667928633.4932845,
  "logger":"controllers.ingress",
  "msg":"successfully built model","model":"{\"id\":\"dev\",\"resources\":{\"AWS::EC2::SecurityGroup\":{\"ManagedLBSecurityGroup\":{\"spec\":{\"groupName\":\"k8s-dev-xxxxx\",\"description\":\"[k8s] Managed SecurityGroup for LoadBalancer\",\"tags\":{\"Level\":\"dev\"},\"ingress\":[{\"ipProtocol\":\"tcp\",\"fromPort\":80,\"toPort\":80,\"ipRanges\":[{\"cidrIP\":\"x.x.x.x/x\"}]},{\"ipProtocol\":\"tcp\",\"fromPort\":443,\"toPort\":443,\"ipRanges\":[{\"cidrIP\":\"x.x.x.x/x\"}]}]}}},\"AWS::ElasticLoadBalancingV2::Listener\":{\"443\":{\"spec\":{\"loadBalancerARN\":{\"$ref\":\"#/resources/AWS::ElasticLoadBalancingV2::LoadBalancer/LoadBalancer/status/loadBalancerARN\"},\"port\":443,\"protocol\":\"HTTPS\",\"defaultActions\":[{\"type\":\"fixed-response\",\"fixedResponseConfig\":{\"contentType\":\"text/plain\",\"statusCode\":\"404\"}}],\"certificates\":[{\"certificateARN\":\"arn:aws:acm:eu-west-1:xxxxxxx:certificate/EXPIRED_CERTIFICATE\"},{\"certificateARN\":\"arn:aws:acm:eu-west-1:xxxxxxxxxxxx:certificate/VALID_CERTIFICATE\"},{\"certificateARN\":\"arn:aws:acm:eu-west-1:xxxxxxx:certificate/ANOTHER_VALID_CERTIFICATE_THAT_SHOULDNT_BE_ATTACHED\"}],\"sslPolicy\":\"ELBSecurityPolicy-2016-08\",\"tags\":{\"Level\":\"dev\"}}},\"80\":{\"spec\":{\"loadBalancerARN\":{\"$ref\":\"#/resources/AWS::ElasticLoadBalancingV2::LoadBalancer/LoadBalancer/status/loadBalancerARN\"},\"port\":80,\"protocol\":\"HTTP\",\"defaultActions\":[{\"type\":\"fixed-response\",\"fixedResponseConfig\":{\"contentType\":\"text/plain\",\"statusCode\":\"404\"}}],\"tags\":{\"Level\":\"dev\"}}}},\"AWS::ElasticLoadBalancingV2::ListenerRule\":{\"443:1\":{\"spec\":{\"listenerARN\":{\"$ref\":\"#/resources/AWS::ElasticLoadBalancingV2::Listener/443/status/listenerARN\"},\"priority\":1,\"actions\":[{\"type\":\"forward\",\"forwardConfig\":{\"targetGroups\":[{\"targetGroupARN\":{\"$ref\":\"#/resources/AWS::ElasticLoadBalancingV2::TargetGroup/dev/api-xxxxx-backend:80/status/targetGroupARN\"}}]}}],\"conditions\":[{\"field\":\"host-header\",\"hostHeaderConfig\":{\"values\":[\"api-xxxx.xxxx.xxx\"]}},{\"field\":\"path-pattern\",\"pathPatternConfig\":{\"values\":[\"/*\"]}}],\"tags\":{\"Level\":\"dev\"}}},\"443:10\":{\"spec\":{\"listenerARN\":{\"$ref\":\"#/resources/AWS::ElasticLoadBalancingV2::Listener/443/status/listenerARN\"},\"priority\":10,\"actions\":[{\"type\":\"forward\",\"forwardConfig\":{\"targetGroups\":[{\"targetGroupARN\":{\"$ref\":\"#/resources/AWS::ElasticLoadBalancingV2::TargetGroup/dev/api-xxxxxx-backend:80/status/targetGroupARN\"}}]}}],\"conditions\":[{\"field\":\"host-header\",\"hostHeaderConfig\":{\"values\":[\"auth-xxxxxx.xxxx.xxxx\"]}},{\"field\":\"path-pattern\",\"pathPatternConfig\":{\"values\":[\"/*\"]}}],\"tags\":{\"Level\":\"dev\"}}},\"443:11\":{\"spec\":{\"listenerARN\":{\"$ref\":\"#/resources/AWS::ElasticLoadBalancingV2::Listener/443/status/listenerARN\"},\"priority\":11,\"actions\":[{\"type\":\"forward\",\"forwardConfig\":{\"targetGroups\":[{\"targetGroupARN\":{\"$ref\":\"#/resources/AWS::ElasticLoadBalancingV2: .....
}
....

As I can see from the logs, 3 certificates seems to be attached, or detected by the controller to be attached to the ALB (if my guess is correct). However, only a single one is in reality attached to the ALB (cf: VALID_CERTIFICATE in the snippet).

kishorj commented 1 year ago

It is possible some of your ingresses have the certificate-arn annotation referring the expired certificates. You either need to update the expired certificates from AWS ACM, or use new certificate in the ingresses.

alexistimic commented 1 year ago

I can confirm none of our manifests refer to the expired certificate.

I tested a few things: 1) Deleting the ALB and applying a manifest with the same ingress-group will create the same ALB, experiencing the same issue 2) Changing the ingress-group will create a new ALB and everything works.

kishorj commented 1 year ago

@alexistimic, wold you be able to email your controller logs and the ingress-group manifests to k8s-alb-controller-triage AT amazon.com. Please include all of the ingresses from your old ingress group.

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

manoelhc commented 1 year ago

I got the "ingress: xxxx none certificate found for host: xxxx" error even though disabling SSL and just using port 80.

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2870#issuecomment-1518690758): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / aws-load-balancer-controller

CertificateNotFound Error (loadbalancer still trying to use a previous certificate) #2870