kubernetes-sigs / aws-load-balancer-controller

A Kubernetes controller for Elastic Load Balancers
https://kubernetes-sigs.github.io/aws-load-balancer-controller/
Apache License 2.0
3.93k stars 1.46k forks source link

[BUG] serviceType: LoadBalancer always go through ALBController when it should not according to documentation #3203

Closed michalschott closed 1 year ago

michalschott commented 1 year ago

Describe the bug

Given service manifest:

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
  name: ingress-nginx-controller-internal-controller
  namespace: ingress-nginx-internal
spec:
  ports:
  - appProtocol: http
    name: http
    port: 80
    protocol: TCP
    targetPort: http
  - appProtocol: https
    name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx-controller-internal
    app.kubernetes.io/name: ingress-nginx
  type: LoadBalancer

As you can see theres no spec.loadBalancerClass key set, but while describing this service from k8s key is set:

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
  creationTimestamp: "2023-05-19T15:02:35Z"
  finalizers:
  - service.k8s.aws/resources
  name: ingress-nginx-controller-internal-controller
  namespace: ingress-nginx-internal
  resourceVersion: "377127041"
  uid: 008e48c3-7a77-4df6-a692-102221b129ba
spec:
  allocateLoadBalancerNodePorts: true
  clusterIP: 172.20.253.220
  clusterIPs:
  - 172.20.253.220
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  loadBalancerClass: service.k8s.aws/nlb
  ports:
  - appProtocol: http
    name: http
    nodePort: 32720
    port: 80
    protocol: TCP
    targetPort: http
  - appProtocol: https
    name: https
    nodePort: 32578
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx-controller-internal
    app.kubernetes.io/name: ingress-nginx
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - [REDACTED]

Figured out this must be mutated by webhook:

...
- admissionReviewVersions:
  - v1beta1
  clientConfig:
    service:
      name: aws-load-balancer-webhook-service
      namespace: aws-load-balancer-controller
      path: /mutate-v1-service
  failurePolicy: Fail
  name: mservice.elbv2.k8s.aws
  objectSelector:
    matchExpressions:
    - key: app.kubernetes.io/name
      operator: NotIn
      values:
      - aws-load-balancer-controller
  rules:
  - apiGroups:
    - ""
    apiVersions:
    - v1
    operations:
    - CREATE
    resources:
    - services
  sideEffects: None
...

Documentation snippet:

When you specify the service.beta.kubernetes.io/aws-load-balancer-type annotation to be external on a Kubernetes Service resource of type LoadBalancer, the in-tree controller will ignore the Service resource. In addition, If you specify the service.beta.kubernetes.io/aws-load-balancer-nlb-target-type annotationon the Service resource, the AWS Load Balancer Controller takes charge of reconciliation by provision an NLB.

For some reason NLB provided by controller does not work the same way as the one provided by in-tree controller. For now, I'd like to skip controller taking over and stick to in-tree.

Is that a typo in documentation? The only way I could achieve sticking to in-tree is by removing above webhook snipped from mutatingwebhookconfiguration, potentially I could also extend matchExpressions.

Steps to reproduce

Expected outcome A concise description of what you expected to happen.

Environment

M00nF1sh commented 1 year ago

/kind documentation @michalschott Hi, this is actually a feature we introduced since v2.5.0 version. The webhook will automatically mutate services to make it reconciled by this controller unless u specified a loadbalancerClass. You can disable this feature via featureFlag on controller: --feature-gates=enableServiceMutatorWebhook=false see more https://github.com/kubernetes-sigs/aws-load-balancer-controller/releases/tag/v2.5.0

we shall update the documentation to reflect this.

M00nF1sh commented 1 year ago

BTW, Services powered this controller provides a super set of feature as the in-tree controller, the only difference i can see is Services provisioned by this controller are "internal-facing" by default for security reasons, unless service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing annotation is specified.

michalschott commented 1 year ago

@M00nF1sh thanks for an explanation. I assume since I do not provide this key, it is being picked and filled wby mutationwebhook.

Any idea what would be the value for now if I'd like to stick to in-tree controller until I figure out what is wrong with LB provided by ALB controller?

Indeed the only diff I have noticed is name, but also upstream healthcheck default value changed from 30s to 10s. Basically behavior I see is nginx-ingress returns 504 for all requests.

michalschott commented 1 year ago

@M00nF1sh interesting, it works in "ip" mode (service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip). In "instance" mode traffic is not forwarded to nginx pods at all - might be something to do with kube-proxy?

update

updated default kube-proxy from 1.24.7 to 1.24.9 and problem is gone.

michalschott commented 1 year ago

@M00nF1sh that problem still appeared recently.