Running multiple instances of kube-ingress-aws-controller

Anyone has successfully implemented multiple instances of kube-ingress-aws-controller on a single cluster? I have a use case where I have to split the ALB because we need to increase timeout for a group of applications while leaving the rest on default timeout. I have managed to deploy via helm for 2 deployments of kube-ingress-aws-controller as follow:

default-controller:

controller:
  annotations:
    iam.amazonaws.com/role: "arn:aws:iam::xxxxx:role/cluster01-kube2iam-kube-ingress-aws"
  args:
    - "-ingress-class-filter=default-alb"
    - "-controller-id=default-ingress-controller"
    - "-target-port=9999"
    - "-health-check-port=9999"
    - "-debug"

special-controller:

controller:
  annotations:
    iam.amazonaws.com/role: "arn:aws:iam::xxxxx:role/cluster01-kube2iam-kube-ingress-aws"
  args:
    - "-ingress-class-filter=special-alb"
    - "-controller-id=special-ingress-controller"
    - "-target-port=9988"
    - "-health-check-port=9988"
    - "-debug"

I can see default-controller and special-controller creates ALB and target respectively and the targets can get healthy state from the instance (skipper can listen to the right ports).

When I created 2 simple nginx deployments with ingresses, I annotate each ingress with:

ingress:
  annotations:
    kubernetes.io/ingress.class: default-alb
    zalando.org/aws-load-balancer-ssl-cert: arn:aws:acm:ap-southeast-2:xxxx:certificate/xxxx
    external-dns.alpha.kubernetes.io/hostname: app1.sandbox.domain.io

ingress:
  annotations:
    kubernetes.io/ingress.class: special-alb
    zalando.org/aws-load-balancer-ssl-cert: arn:aws:acm:ap-southeast-2:xxxx:certificate/xxxx
    external-dns.alpha.kubernetes.io/hostname: app2.sandbox.domain.io

Each ingress controller can pick up the ingress and allocate the right ALB.

However, when I try to curl app1.sandbox.domain.io, I got "not found". When I put skipper belonging to the default-controller in debug mode (-application-log-level=DEBUG), I found that skipper cannot find the ingress, hence can't provide the route.

[APP]time="2019-07-09T05:34:34Z" level=debug msg="polling for updates"
[APP]time="2019-07-09T05:34:34Z" level=debug msg="making request to: https://172.20.0.1:443/apis/extensions/v1beta1/ingresses"
[APP]time="2019-07-09T05:34:34Z" level=debug msg="request to https://172.20.0.1:443/apis/extensions/v1beta1/ingresses succeeded"
[APP]time="2019-07-09T05:34:34Z" level=debug msg="all ingresses received: 2"
[APP]time="2019-07-09T05:34:34Z" level=debug msg="filtered ingresses by ingress class: 0"
[APP]time="2019-07-09T05:34:34Z" level=debug msg="making request to: https://172.20.0.1:443/api/v1/services"
[APP]time="2019-07-09T05:34:34Z" level=debug msg="request to https://172.20.0.1:443/api/v1/services succeeded"
[APP]time="2019-07-09T05:34:34Z" level=debug msg="all services received: 10"
[APP]time="2019-07-09T05:34:34Z" level=debug msg="making request to: https://172.20.0.1:443/api/v1/endpoints"
[APP]time="2019-07-09T05:34:34Z" level=debug msg="request to https://172.20.0.1:443/api/v1/endpoints succeeded"
[APP]time="2019-07-09T05:34:34Z" level=debug msg="all endpoints received: 13"
[APP]time="2019-07-09T05:34:34Z" level=debug msg="default filters are disabled"
[APP]time="2019-07-09T05:34:34Z" level=debug msg="got default filter configurations for 0 services"
[APP]time="2019-07-09T05:34:34Z" level=debug msg="all routes created: 0"

Anyone can help or provide a guide to troubleshoot why it cannot filter the ingress correctly? Thank you.

@lkusnadi As far as I understood you use 2 kube-ingress-aws-controllers deployments and one skipper ingress deployment to route the traffic. So both AWS TargetGroups point to skipper. I think you need to add -kubernetes-ingress-class="(default|special)-alb$", such that skipper will find both.

Another hint: You can check the skipper routes: port-forward and https://opensource.zalando.com/skipper/tutorials/basics/#current-routing-table

@szuecs Thanks for your fast reply. Currently I use 2 sets of skippers: one belongs to default-alb listening on port 9999, the other belongs to special-alb listening on port 9988. The reason for this is to have a different backend timeout in skipper. I've applied the following arguments to the "default-skippers" daemonset:

      containers:
      - args:
        - skipper
        - -kubernetes
        - -kubernetes-in-cluster
        - -address=:9999
        - -proxy-preserve-host
        - -serve-host-metrics
        - -enable-ratelimits
        - -experimental-upgrade
        - -metrics-exp-decay-sample
        - -kubernetes-https-redirect=true
        - -lb-healthcheck-interval=3s
        - -metrics-flavour=codahale,prometheus
        - -enable-connection-metrics
        - -kubernetes-ingress-class="default-alb"
        - -support-listener=127.0.0.1
        image: registry.opensource.zalan.do/pathfinder/skipper:v0.10.220

the kubernetes-ingress-class argument does not solve the problem. It still says 0 filtered ingresses by ingress class.

But this is what I found interesting:

Despite having 2 separate skipper daemonsets running in the same cluster, when I do kubectl exec into one of the skipper pods, I can see it listens on both port 9999 and 9988, which I believe is incorrect. Because I have different skipper that is set to listen to different ports. Is it the zalando kube-ingress architecture to only run 1 set of skippers? Or I need to opt for traefik to replace skipper for the second daemonset?

Also, the -support-listener=127.0.0.1 cannot provide a listener on port 9911 on the skipper pod. Because the doc in the code suggests support-listener accept network address.

Thank you.

Sorry, to not have followed up since some days! I just overlooked the notification :(

@szuecs Thanks for your fast reply. Currently I use 2 sets of skippers: one belongs to default-alb listening on port 9999, the other belongs to special-alb listening on port 9988. The reason for this is to have a different backend timeout in skipper.

Got it now, so you create different sets of ALBs with 2 kube-ingress-aws-controller and have for these 2 different set of skippers running.

You would need to run kube-ingress-aws-controller with -ingress-class-filter=default-alb and a second with -ingress-class-filter=non-default-alb and you need for the non-default-alb to change the target -target-port=9988.

I've applied the following arguments to the "default-skippers" daemonset:
      containers:
      - args:
        - skipper
        - -kubernetes
        - -kubernetes-in-cluster
        - -address=:9999

This will make skipper listen on port 9999 for the "default-alb" case, the "non-default-alb" skipper would need to run with -address=:9988.

    - -proxy-preserve-host
    - -serve-host-metrics
    - -enable-ratelimits
    - -experimental-upgrade
    - -metrics-exp-decay-sample
    - -kubernetes-https-redirect=true
    - -lb-healthcheck-interval=3s
    - -metrics-flavour=codahale,prometheus
    - -enable-connection-metrics
    - -kubernetes-ingress-class="default-alb"

From skipper point of view this is correct for the default-alb case. You would need a second skipper with -kubernetes-ingress-class="non-default-alb", that listens on another port.

    - -support-listener=127.0.0.1

A listener can have an optional IP, but needs always a port, so - -support-listener=127.0.0.1:9911 would be correct and of course you would need another port for the other skipper.

    image: registry.opensource.zalan.do/pathfinder/skipper:v0.10.220



the kubernetes-ingress-class argument does not solve the problem. It still says 0 filtered ingresses by ingress class.

The ingress definition would need the annotation set: kubernetes.io/ingress.class: non-default-alb to get the non default loadbalancer stack served by non-default-alb skipper instance.

The default case would also need to set the annotation to kubernetes.io/ingress.class: default-alb

But this is what I found interesting:

Despite having 2 separate skipper daemonsets running in the same cluster, when I do kubectl exec into one of the skipper pods, I can see it listens on both port 9999 and 9988, which I believe is incorrect. Because I have different skipper that is set to listen to different ports. Is it the zalando kube-ingress architecture to only run 1 set of skippers?

This sounds totally weird I am not sure how you created the daemonset pod spec. In the end if you check with netstat you will see all host ports, which might explain the weirdness, because you see all pods from the node and both daemonsets are running on all nodes.

Or I need to opt for traefik to replace skipper for the second daemonset?

I don't think that you need to switch the implementation. This would be a major bug on our side, which we cover in our unit/integration tests, but not yet in our e2e tests. So if this is a real issue, I consider writing an e2e test for this to make sure we don't break this.

zalando-incubator / kube-ingress-aws-controller

Running multiple instances of kube-ingress-aws-controller #263