vmware-tanzu / cluster-api-provider-bringyourownhost

Kubernetes Cluster API Provider BYOH for already-provisioned hosts running Linux.
Apache License 2.0
232 stars 77 forks source link

when creating first byoh cluster, its first controller node stuck with 'Installer config is not ready, requeuing' #811

Open haiwu opened 1 year ago

haiwu commented 1 year ago

What steps did you take and what happened: I followed the exact step in the quickstart guide, except I am using an external existing k8s cluster as its management cluster, and I am trying to create the first worker node k8s cluster. I have 2 VMs in the same vlan as the management cluster nodes, and I picked another free IP address in the same vlan as all these nodes (5 nodes in the management cluster, and 2 nodes for the new worker node cluster, they are all in the same vlan)

What did you expect to happen: It should work to first join the first node to the new worker node cluster as controller node, then join the 2nd node to the new worker node cluster as a worker node.

Anything else you would like to add: On manager side:

clusterctl describe cluster byoh-cluster

NAME READY SEVERITY REASON SINCE MESSAGE Cluster/byoh-cluster False Warning ScalingUp 33h Scaling up control plane to 1 replicas (actual 0) ├─ClusterInfrastructure - ByoCluster/byoh-cluster ├─ControlPlane - KubeadmControlPlane/byoh-cluster-control-plane False Warning ScalingUp 33h Scaling up control plane to 1 replicas (actual 0) │ └─Machine/byoh-cluster-control-plane-dnw9b False Info WaitingForInfrastructure 33h 1 of 2 completed │ └─MachineInfrastructure - ByoMachine/byoh-cluster-control-plane-fzztt └─Workers └─MachineDeployment/byoh-cluster-md-0 False Warning WaitingForAvailableMachines 33h Minimum availability requires 1 replicas, current 0 available └─Machine/byoh-cluster-md-0-586c64c9c7xnbsng-p8dhj False Info WaitingForInfrastructure 33h 0 of 2 completed ├─BootstrapConfig - KubeadmConfig/byoh-cluster-md-0-g8s7l False Info WaitingForControlPlaneAvailable 33h └─MachineInfrastructure - ByoMachine/byoh-cluster-md-0-4b849

I0615 13:20:05.327327 1 byomachine_controller.go:89] controller/byomachine "msg"="Reconcile request received" "name"="byoh-cluster-control-plane-fzztt" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="ByoMachine" I0615 13:20:05.327534 1 byomachine_controller.go:191] controller/byomachine "msg"="Fetching an attached ByoHost" "name"="byoh-cluster-control-plane-fzztt" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="ByoMachine" I0615 13:20:05.327621 1 byomachine_controller.go:208] controller/byomachine "msg"="Successfully fetched an attached Byohost" "byohost"="mine-k8sbyoh003" "name"="byoh-cluster-control-plane-fzztt" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="ByoMachine" I0615 13:20:05.327720 1 byomachine_controller.go:236] controller/byomachine "msg"="Reconciling ByoMachine" "cluster"="byoh-cluster" "name"="byoh-cluster-control-plane-fzztt" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="ByoMachine" I0615 13:20:05.333866 1 byomachine_controller.go:455] controller/byomachine "msg"="Installer config is not ready, requeuing" "cluster"="byoh-cluster" "name"="byoh-cluster-control-plane-fzztt" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="ByoMachine"

on agent side: (the first controller host that is trying to join the new worker node cluster) I0615 09:20:06.269852 34741 host_reconciler.go:50] controller/byohost "msg"="Reconcile request received" "name"="mine-k8sbyoh003" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="ByoHost" I0615 09:20:06.273345 34741 host_reconciler.go:89] controller/byohost "msg"="reconcile normal" "ByoHost"="mine-k8sbyoh003" "name"="mine-k8sbyoh003" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="ByoHost" I0615 09:20:06.276298 34741 host_reconciler.go:114] controller/byohost "msg"="InstallationSecret not ready" "ByoHost"="mine-k8sbyoh003" "name"="mine-k8sbyoh003" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="ByoHost"

Environment:

dharmjit commented 1 year ago

Hey @haiwu, From the information above it seems like you are using a non-supported OS. Please note that CAP-BYOH only supports Ubuntu 20.04 OS

haiwu commented 1 year ago

@dharmjit : Where in the source code it would reject any non-Ubuntu 20.04 OS? How could I add support for another non-Ubuntu 20.04 OS? I saw in the TGIK video that there might be a way, but the docs here do not mention that..

dharmjit commented 1 year ago

How could I add support for another non-Ubuntu 20.04 OS

@haiwu Different OS installer implementations have to go here. The complex part is that with more OS/Versions, the complexity of maintaining agent code, artifacts distribution(bundles), and maintaining tests/CI will increase.

Could you let me know which OS are you trying to use BYOH with?

haiwu commented 1 year ago

@dharmjit: Yes, I have the patch for new OS tested ok already (with support for k8s 1.27), I will try to get ok from my side and put in a new PR later.