kubernetes-sigs / aws-load-balancer-controller

A Kubernetes controller for Elastic Load Balancers
https://kubernetes-sigs.github.io/aws-load-balancer-controller/
Apache License 2.0
3.93k stars 1.46k forks source link

ALB does not have any target or the right number of targets registered. #2909

Closed khteh closed 1 year ago

khteh commented 1 year ago

Describe the bug A concise description of what the bug is. I have some applications which support HTTP/2 and some don't. So, I am experimenting with alb.ingress.kubernetes.io/backend-protocol-version: HTTP2 ALB annotation. The problem is that everytime after I kubectl apply -f ingress.yml, I get 503 error. I guess it takes some time for the ALB to apply the changes but no. Same response even after a long wait.

There are 2 target groups which map to my ingress rules. However, when I apply the alb.ingress.kubernetes.io/backend-protocol-version: HTTP2 annotation, one of my Service has registered target while the other doesn't have. When I remove the same annotation, both of them do NOT have any registered target at all. Steps to reproduce

Expected outcome Expect the removal of alb.ingress.kubernetes.io/backend-protocol-version: HTTP2 annotation replaces the HTTP/2 targets with HTTP/1.1 targets in the target groups.

Environment

Additional Context:

kishorj commented 1 year ago

@khteh, you mentioned one service doesn't have registered target, is it the one that doesn't support http2? Also, do you not see any targets in the target group or the targets show up as unhealthy?

If you specify the alb.ingress.kubernetes.io/backend-protocol-version annotation on your ingress resource, it will apply to all of the target groups unless you override via the same annotation on your backend service resources.

kishorj commented 1 year ago

The underlying issue is similar to #2910. Lets close one as a duplicate.

khteh commented 1 year ago

@khteh, you mentioned one service doesn't have registered target, is it the one that doesn't support http2? Also, do you not see any targets in the target group or the targets show up as unhealthy?

If you specify the alb.ingress.kubernetes.io/backend-protocol-version annotation on your ingress resource, it will apply to all of the target groups unless you override via the same annotation on your backend service resources.

Yes, the one which doesn't support HTTP/2 doesn't have any registered target at all, even now after I apply alb.ingress.kubernetes.io/backend-protocol-version: HTTP1 to the Service. It's not unhealthy target but no registered target at all.

Now, I remove the protocol version annotation entirely from the ingress manifest but apply it directly to the individual Services. It works for the HTTP/2 target groups. However, I only see one registered target but there are 2 Endpoints registered to the Service. What happen to the other target? Why is it not registered to the same target group? And sometimes when I restart the Service and Statefulset, there is no target registered at all but I them in kubectl describe ingress output.

M00nF1sh commented 1 year ago

@khteh would you help provide YAMLs and detailed steps to reproduce this error?

This shouldn't happen as target registration is a standalone process, which don't care backend-protocol-version annotation on Ingress/Service.

khteh commented 1 year ago

HTTP/1 Service with totally missing targets:

apiVersion: v1
kind: Service
metadata:
  name: svc-pgadmin
  annotations:
    alb.ingress.kubernetes.io/backend-protocol: HTTPS
    alb.ingress.kubernetes.io/healthcheck-protocol: HTTPS
    alb.ingress.kubernetes.io/backend-protocol-version: HTTP2
    alb.ingress.kubernetes.io/healthcheck-path: /login
  labels:
    app: pgadmin
    component: postgresql
spec:
  ports:
    - protocol: TCP
      port: 80
      targetPort: http
      name: http
    - protocol: TCP
      port: 443
      targetPort: https
      name: https
  clusterIP: None
  selector:
    app: pgadmin
    component: postgresql

HTTP/2 Service with only ONE of the 2 targets registered:

apiVersion: v1
kind: Service
metadata:
  name: svc-kyberlifeiam
  annotations:
    alb.ingress.kubernetes.io/backend-protocol: HTTPS
    alb.ingress.kubernetes.io/healthcheck-protocol: HTTPS
    alb.ingress.kubernetes.io/backend-protocol-version: HTTP2
    alb.ingress.kubernetes.io/healthcheck-path: /health/live
  labels:
    component: kyberlifeiam
    app: kyberlifeiam
spec:
  ports:
    - name: https
      protocol: TCP
      port: 443
      targetPort: https
    - name: http
      port: 80
      protocol: TCP
      targetPort: http
  clusterIP: None # Headless. type cannot be LoadBalancer
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 300
  selector:
    app: kyberlifeiam
    component: kyberlifeiam

Ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress
  namespace: default
  annotations:
    alb.ingress.kubernetes.io/load-balancer-name: kyberlife
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    #kubernetes.io/ingress.class: alb - Not needed when ingressClassName is used below
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
    alb.ingress.kubernetes.io/ssl-redirect: "443"
    alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS-1-2-Ext-2018-06
    alb.ingress.kubernetes.io/backend-protocol: HTTPS
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:ap-southeast-1:<account>:certificate/<GUID>
    alb.ingress.kubernetes.io/load-balancer-attributes: routing.http2.enabled=true
    alb.ingress.kubernetes.io/target-group-attributes: stickiness.enabled=true,stickiness.lb_cookie.duration_seconds=300
    alb.ingress.kubernetes.io/auth-session-cookie: kyberlife
    alb.ingress.kubernetes.io/auth-session-timeout: "3600"
    alb.ingress.kubernetes.io/healthcheck-port: traffic-port
    alb.ingress.kubernetes.io/healthcheck-protocol: HTTPS
    alb.ingress.kubernetes.io/healthcheck-interval-seconds: "20"
    alb.ingress.kubernetes.io/healthcheck-timeout-seconds: "10"
    alb.ingress.kubernetes.io/healthy-threshold-count: "3"
    alb.ingress.kubernetes.io/unhealthy-threshold-count: "3"
    alb.ingress.kubernetes.io/success-codes: "200"
  labels:
    app: ingress
spec:
  ingressClassName: alb
  rules:
    - host: iam.kyberlife.io
      http:
        paths:
          - backend:
              service:
                name: svc-kyberlifeiam
                port:
                  number: 443
            pathType: Prefix
            path: /
    - host: pgadmin.kyberlife.io
      http:
        paths:
          - backend:
              service:
                name: svc-pgadmin
                port:
                  number: 443
            pathType: Prefix
            path: /
kishorj commented 1 year ago

@khteh, do the backend pods behind both services pass the liveness/readiness checks? Could you check if the service endpoints list out all of your application pods?

khteh commented 1 year ago

@khteh, do the backend pods behind both services pass the liveness/readiness checks? Could you check if the service endpoints list out all of your application pods?

Yes they do!

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2909#issuecomment-1546788610): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.