vmware-tanzu / tanzu-framework

Tanzu Framework provides a set of building blocks to build atop of the Tanzu platform and leverages Carvel packaging and plugins to provide users with a much stronger, more integrated experience than the loose coupling and stand-alone commands of the previous generation of tools.
Apache License 2.0
197 stars 193 forks source link

Failed KCP Controller Manager etcd member health checks cause upgrade and scale issues in Azure #2776

Open josh-ferrell opened 2 years ago

josh-ferrell commented 2 years ago

Bug description When deployed to an Azure private cluster the etcd member health checks required by the KCP controller manager will fail because of a hairpin NAT issue if the KCP controller manager is running on one of the internal load balancer members(control plane node). In certain situations this can delay or prevent entirely the rolling update of a Kubeadm Control Plane referencing a new AzureMachineTemplate such as a cluster upgrade.

This is a duplicate of the upstream Cluster API issue kubernetes-sigs/cluster-api#6765

Affected product area (please put an X in all that apply)

Expected behavior Cluster upgrades are not prevented due to KCP controller pod placement

Steps to reproduce the bug Deploy an Azure backed private cluster and attempt to upgrade it.

Version (include the SHA if the version is not obvious)

Environment where the bug was observed (cloud, OS, etc) Azure, Ubuntu 20.04

Relevant Debug Output (Logs, manifests, etc)

github-actions[bot] commented 2 years ago

Hey @josh-ferrell! Thanks for opening your first issue. We appreciate your contribution and welcome you to our community! We are glad to have you here and to have your input on Tanzu Framework.