BKPR create classic load balancer rather than NLB on EKS

neeraj-htp commented 3 years ago

While bootstrapping the EKS cluster with BKPR, it creates a classic load balancer to expose ingresses. How can I change the behavior to create Network Load Balancer? Network Load Balancers are recommended LB now on AWS.

Can someone point me in the right direction on how to switch from Classic Load Balancer to NLB? What all components I need to modify?

neeraj-htp commented 3 years ago

Further finding:

I checked the nginx-ingress service on the cluster. It is missing annotation

service.beta.kubernetes.io/aws-load-balancer-type: nlb

There are further two annotations recommend by nginx-ingress which are also missing.

service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: 'true'

Does adding the annotation will create new NLB and map my existing ingress?

javsalgar commented 3 years ago

Hi,

Thank you for using BKPR. I'm not entirely sure if patching the current service will create the annotations. If that doesn't change, you may need to redeploy the service object.

neeraj-htp commented 3 years ago

Thanks @javsalgar . I patched the nginx-ingress service with annotation

service.beta.kubernetes.io/aws-load-balancer-type: nlb

and it created the nlb. I followed this guide. The only issue I'm facing now is, the response time on grafana and kibana portals slowed down drastically. I'm trying to figure out the cause but can't see any logical explanation behind it. Apart from slow response time, everything is working fine. Adding the annotation created the nlb and target groups. I also deleted the older classic load balancer created default.

neeraj-htp commented 3 years ago

I think patching with the annotation is not a good idea. It meshed up the nlb settings.

The reason behind the slow response time was: nlb dnslookup resolve to three IP addresses. One of the IP is not reachable. So whenever a request hit grafana.my-domain.com, it goes to this unreachable IP address. It times out from there, that's when it connects to the correct IP address. That way each request is reaching to correct destination after timing out with that unreachable IP address. I don't know from where that IP address is coming from.
IP addresses assigned to nlb usually depend on target groups. So I checked the target groups. Now, these target groups were created for the Classic load balancer. The health check endpoints for targets were of HTTP type which is not reachable. On another cluster where I'm successfully running nlb with kong, health check endpoints are of type tcp rather than HTTP. That is why it is showing unhealthy status for each node in the target group.

@javsalgar Is there a way to add the annotation in the original stack so that from the beginning kubeprod will create nlb. And I think it should be the default practice. nlb is recommended over classic load balancer.

javsalgar commented 3 years ago

Hi,

Thank you very much for the input! Yes, you can use the overrides features so you can deploy your own customizations to your BKPR installation. Check this section of the documentation: https://github.com/bitnami/kube-prod-runtime/blob/master/docs/overrides.md

Please let us know if you find any issues.

neeraj-htp commented 3 years ago

@javsalgar I feel overriding will also not solve the issue. Initially kubeprod creates the Classic Load balancer. When you will apply the update using overrides, it will create a separate nlb but pickes the existing target groups. This will be a problem because the health check endpoint in these target groups is of type http. Which will show all the nodes unhealthy to the nlb. You can not change health check point type in target groups. For nlb, target groups usually have tcp health check points.

Is there a way to bootstrap the cluster with overrides? I mean the initial installation should consider the overrides.

javsalgar commented 3 years ago

Yes, you could deploy the whole installation using kubecfg. kubeprod generates the jsonnet files so, apart from updating, you could use kubecfg to delete the whole installation. After that, then deploy everything using kubecfg again. The cloud settings should be already set so with that it should be enough.

vmware-archive / kube-prod-runtime

BKPR create classic load balancer rather than NLB on EKS #1061