Open ybizeul opened 3 weeks ago
More of a limitation in kube-vip. And as we discussed in slack, this is fixed in kube-vip 0.8.
But even kube-vip is bumped to 0.8, since node name doesn't match the fqdn name, CPI will fail to find the corresponding VM for the node.
I think we should revert this PR https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/commit/fcd243dd96c3e70df12e2766041b4444da834413
It was aiming at addressing the node name length issue, but then if customer has search domain defined for hostname and the length is within 64 character, CPI will fail to initialize node
If you all think it's reasonable I can create an issue to revert it
Thank you @lubronzhan.
I will also add my remarks from this morning.
We have that issue with node names, related to v0.6.4, which isn't a problem for initial deployment and control plane HA But as it is, the kube-vip static pods are only deployed on control nodes anyways, which, as far as I know, doesn't have LB Service resources.
Maybe CAPV intended way to implement kube-vip for workloads is through a new daemonset on the workload cluster, the problem with that is the control plane kube-vip has svc_enable
to true so I'm afraid both will try to handle the new resources.
More of a limitation in kube-vip. And as we discussed in slack, this is fixed in kube-vip 0.8.
But even kube-vip is bumped to 0.8, since node name doesn't match the fqdn name, CPI will fail to find the corresponding VM for the node.
I think we should revert this PR fcd243d
It was aiming at addressing the node name length issue, but then if customer has search domain defined for hostname and the length is within 64 character, CPI will fail to initialize node
If you all think it's reasonable I can create an issue to revert it
Ohk I would correct mine comment around the real issue. It's actually the /etc/hosts is not having the short-hostname ip
entry so kube-vip can't resolve it. And it should be fixed in kube-vip 0.8. This is unrelated to reverting https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/commit/fcd243dd96c3e70df12e2766041b4444da834413
Maybe CAPV intended way to implement kube-vip for workloads is through a new daemonset on the workload cluster, the problem with that is the control plane kube-vip has svc_enable to true so I'm afraid both will try to handle the new resources.
Current way would support externalTrafficPolicy: cluster
, which most people use, maybe that's why it's not removed. If you want to deploy different set of kube-vip, you can modify your kubeadmControlPlane to remove the svc_enable: true
from the files
section which contains kube-vip manifest.
But there is still the problem that 0.6.4 wouldn't work even with cluster policy, because of the short name bug right ?
But there is still the problem that 0.6.4 wouldn't work even with cluster policy, because of the short name bug right ?
Yes. So need to upgrade to new 0.8.
This PR https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/commit/fcd243dd96c3e70df12e2766041b4444da834413#diff-e76b4b2137138f55f29ce20dd0ab8287648f3d8eb23a841a2e5c51ff88949750R120 also add the local_hostname to /etc/hosts. Maybe that's also one of the reason, so only the short hostname is in your /etc/hosts
. Could you check that?
Maybe also relevant: the kube-vip static pod currently runs with its own /etc/hosts
:
This is to workaround the issue at:
An improved way may be a preKubeadmCommand which copies the original /etc/hosts
and just adds kubernetes
to it.
Note: this workaround is only required when kube-vip runs as static pod.
So instead of adding the static file /etc/kube-vip.hosts
, we could maybe use a preKubeadmCommand
like the following if this helps:
sed -E -e 's/^(127.0.0.1.*)/\1 kubernetes/g' /etc/hosts > /etc/kube-vip.hosts
/kind bug
What steps did you take and what happened:
What did you expect to happen:
I was expecting kube-vip to assign external ip to the IP given by kube-vip-cloud-provider
Anything else you would like to add:
This seems to be related to https://github.com/kube-vip/kube-vip/issues/723 which prevents provisioning of the IP on the service when node names aren't FQDNs (which is what CAPV does it seems, and I didn't find a way to change that yet. Adding a searchdomain leads to the same result).
After changing
kube-vip
Pod image tov0.8.2
incluster.yaml
as :Environment:
kubectl version
):/etc/os-release
):