Nodebalancer not updated when adding new nodes to the Kubernetes cluster

Jawshua commented 5 years ago

General:

[x] Have you removed all sensitive information, including but not limited to access keys and passwords?
[x] Have you checked to ensure there aren't other open or closed Pull Requests for the same bug/feature/question?

Bug Reporting

Expected Behavior

When adding/removing nodes in a cluster, existing Nodebalancer endpoints should be updated to reflect the change.

Actual Behavior

Nodebalancers will only ever point to the nodes that were present when the k8s LoadBalancer was created.

Steps to Reproduce the Problem

Provision a cluster (we're using terraform-linode-k8s)
Create a Service with the LoadBalancer type. Our specific use case is nginx-ingress (helm install stable/nginx-ingress)
Once the Nodebalancer is provisioned, add or remove some nodes from the k8s cluster.

Environment Specifications

Kubernetes environment provisioned with terraform-linode-k8s
g6-standard-4 nodes
Increasing node count from 4 to 8 caused the issue this time. In the past I've encountered it while removing nodes.

Screenshots, Code Blocks, and Logs

kubectl describe service nginx-ingress spits out error events:

Events:
  Type     Reason                    Age    From                Message
  ----     ------                    ----   ----                -------
  Warning  LoadBalancerUpdateFailed  74s    service-controller  Error updating load balancer with new hosts map[prod-ap-northeast-node-7:{} prod-ap-northeast-node-8:{} prod-ap-northeast-node-5:{} prod-ap-northeast-node-6:{} prod-ap-northeast-node-4:{} prod-ap-northeast-node-1:{} prod-ap-northeast-node-3:{} prod-ap-northeast-node-2:{}]: [400] [X-Filter] Cannot filter on nodebalancer_id

Additional Notes

For general help or discussion, join the Kubernetes Slack team channel #linode. To sign up, use the Kubernetes Slack inviter.

The Linode Community is a great place to get additional support.

asauber commented 5 years ago

Hi @Jawshua this issue was fixed with this commit.

https://github.com/linode/linode-cloud-controller-manager/commit/4ced29555938feda71150aca823856b52faa533e

See if you can redeploy the CCM using the latest image. I would recommend deleting the DaemonSet from kube-system and re-applying the manifest that can be found in terraform-linode-k8s (the DaemonSet only)

https://github.com/linode/terraform-linode-k8s/blob/master/modules/masters/manifests/ccm-linode.yaml

Please let me know if you run into any issues with this approach.

asauber commented 5 years ago

Unfortunately this is the case. A sufficiently old version is being referenced by the Terraform module.

root@localhost:~# docker inspect -f '{{ .Created }}' linode/linode-cloud-controller-manager
2018-11-30T13:46:27.144258049Z

Working on pushing a new one now.

asauber commented 5 years ago

Done. Please try redeploying the DaemonSet and let me know if you run into any issues. In fact, you should be able to simply delete the Pods and they will be redeployed by first pulling the new image.

root@localhost:~# docker inspect -f '{{ .Created }}' linode/linode-cloud-controller-manager
2019-01-30T18:18:10.1684702Z

wideareashb commented 5 years ago

This is almost certainly a related problem but if a node is shutdown long enough for it to be removed from the NodeBalancer then when the node is brought back up, it is not added back to the NodeBalancer.

linode / linode-cloud-controller-manager