Closed julien-sugg closed 9 months ago
This error is about a Machine
CRD in your management cluster, not about Talos itself. CACPPT needs addresses to talk to the Talos API. It should be the infrastructure provide job to provide these addresses.
Thanks for the update, the error indeed got me in the wrong way.
The real issue is that I temporarily commented out the vm tools extra manifests configuration which is why the IPs were not retrieved any more at vSphere level. Indeed, the underlying vspheremachines
were stuck in the WaitingForIPAllocation
status.
Enabling it back solved the issue
---
apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
kind: TalosControlPlane
metadata:
...
spec:
...
controlPlaneConfig:
controlplane:
generateType: controlplane
talosVersion: v1.5.2
configPatches:
...
- op: replace
path: /cluster/extraManifests
value:
- "https://raw.githubusercontent.com/mologie/talos-vmtoolsd/master/deploy/unstable.yaml"
Greetings,
We are facing some issues bootstrapping a worker cluster on VMWare ESXi 7.0.3 using the vSphere CAPV and Talos documentations.
Versions
Description
We manually created a first Talos cluster on VMWare using the
OVA
&talosctl
, and then installed the appropriate operators usingclusterctl
. The cluster has the following minimal patches and use the defaults otherwise:The operators were successfully installed with the following command:
However, we then tried to create a basic cluster via Kustomize and minimalistic Cluster API manifests, and failed to succeed due to bootstrap failures related to hostname resolution issues using the DNS search list.
Click to expand manifests
```yaml ➜ k kustomize observability-cluster-poc apiVersion: v1 kind: Secret metadata: name: observability-cluster-poc namespace: cluster-api-system stringData: password: REDACTED username: clusterapi --- apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3 kind: TalosConfigTemplate metadata: name: observability-cluster-poc-md-0 namespace: cluster-api-system spec: template: spec: configPatches: - op: add path: /machine/network value: interfaces: - dhcp: true interface: eth0 nameservers: - 172.30.110.1 - op: add path: /machine/install value: extraKernelArgs: - net.ifnames=0 - op: add path: /cluster/network/cni value: name: none - op: add path: /cluster/proxy value: disabled: true - op: add path: /machine/features/kubePrism value: enabled: true port: 7445 - op: replace path: /cluster/controlPlane value: endpoint: https://172.30.11.10:6443 - op: add path: /machine/certSANs value: - 172.30.11.10 - op: add path: /machine/time value: disabled: false servers: - 172.30.110.1 generateType: worker talosVersion: v1.5.2 --- apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster metadata: labels: cluster.x-k8s.io/cluster-name: observability-cluster-poc name: observability-cluster-poc namespace: cluster-api-system spec: controlPlaneRef: apiVersion: controlplane.cluster.x-k8s.io/v1alpha3 kind: TalosControlPlane name: observability-cluster-poc infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: VSphereCluster name: observability-cluster-poc --- apiVersion: cluster.x-k8s.io/v1beta1 kind: MachineDeployment metadata: labels: cluster.x-k8s.io/cluster-name: observability-cluster-poc name: observability-cluster-poc-md-0 namespace: cluster-api-system spec: clusterName: observability-cluster-poc replicas: 3 selector: matchLabels: {} strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 0 type: RollingUpdate template: metadata: labels: cluster.x-k8s.io/cluster-name: observability-cluster-poc spec: bootstrap: configRef: apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3 kind: TalosConfigTemplate name: observability-cluster-poc-md-0 clusterName: observability-cluster-poc infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: VSphereMachineTemplate name: observability-cluster-poc-worker version: v1.27.5 --- apiVersion: controlplane.cluster.x-k8s.io/v1alpha3 kind: TalosControlPlane metadata: name: observability-cluster-poc namespace: cluster-api-system spec: controlPlaneConfig: controlplane: configPatches: - op: add path: /machine/network value: interfaces: - dhcp: true interface: eth0 vip: ip: 172.30.11.10 nameservers: - 172.30.110.1 - op: add path: /machine/install value: extraKernelArgs: - net.ifnames=0 - op: add path: /cluster/network/cni value: name: none - op: add path: /cluster/proxy value: disabled: true - op: add path: /machine/features/kubePrism value: enabled: true port: 7445 - op: replace path: /cluster/controlPlane value: endpoint: https://172.30.11.10:6443 - op: add path: /machine/certSANs value: - 172.30.11.10 - op: add path: /cluster/coreDNS value: disabled: true - op: add path: /machine/time value: disabled: false servers: - 172.30.110.1 generateType: controlplane talosVersion: v1.5.2 infrastructureTemplate: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: VSphereMachineTemplate name: observability-cluster-poc replicas: 3 rolloutStrategy: rollingUpdate: maxSurge: 1 type: RollingUpdate version: v1.27.5 --- apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: VSphereCluster metadata: name: observability-cluster-poc namespace: cluster-api-system spec: controlPlaneEndpoint: host: 172.30.11.10 port: 6443 identityRef: kind: Secret name: observability-cluster-poc server: REDACTED thumbprint: REDACTED --- apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: VSphereMachineTemplate metadata: name: observability-cluster-poc namespace: cluster-api-system spec: template: spec: cloneMode: linkedClone datacenter: REDACTED datastore: REDACTED diskGiB: 25 folder: cluster-api-vms memoryMiB: 8192 network: devices: - dhcp4: true networkName: PLATFORM-PRODUCTION-OBSERVABILITY - dhcp4: true networkName: PRODUCTION numCPUs: 2 os: Linux powerOffMode: hard resourcePool: Cluster-API-POC server: REDACTED storagePolicyName: "" template: talos-linux-1.5.2 thumbprint: REDACTED --- apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: VSphereMachineTemplate metadata: name: observability-cluster-poc-worker namespace: cluster-api-system spec: template: spec: cloneMode: linkedClone customVMXKeys: disk.EnableUUID: "true" datacenter: REDACTED datastore: REDACTED diskGiB: 25 folder: cluster-api-vms memoryMiB: 8192 network: devices: - dhcp4: true networkName: PLATFORM-PRODUCTION-OBSERVABILITY - dhcp4: true networkName: PRODUCTION numCPUs: 2 os: Linux powerOffMode: hard resourcePool: Cluster-API-POC server: REDACTED storagePolicyName: "" template: talos-linux-1.5.2 thumbprint: REDACTED ```Logs and outputs
On the CACPPT controller, the following logs are repeating upon each reconciliation attempt:
The line
2023-10-06T06:52:14Z INFO controllers.TalosControlPlane bootstrap failed, retrying in 20 seconds {"namespace": "cluster-api-system", "talosControlPlane": "observability-cluster-poc", "error": "no addresses were found for node \"observability-cluster-poc-hvv4m\""}
is especially troublesome and we digged further around this without any success.Our networking is solely handled by a dedicated
OpnSense
instance.All the VMs have
DHCP
enabled andDHCP Leases
are automatically and successfully registered for all our VMs, including the new ones that are failing to bootstrap.When I try to create a dummy nettool Pod is the cluster, everything is working like a charm and we can see that /etc/resolv.conf is properly configured with the appropriate search list:
I have a similar behavior when directly attaching a nettool debug container on the CAPPT controller:
I also attached a debug container to coredns, just in case:
Thanks for your help.