Open julien-sugg opened 1 year ago
It was discussed in community Slack, but it didn't quite go that far.
VMWare users need to reimplement vmtoolsd to be a Talos system extension (and an extension service), this way it will run always with the machine.
Another option is to make Talos itself report IPs, if we can do that without pulling all VMWare libraries in.
Hi everyone, I face the same problem right now. Are there any updates or instructions to follow to work around this?
Also interested to see the fix for this issue, thanks?
I found a way to deploy, just create a TalosOS with vmtoolds installed by default using Talos image fabric and the use that one as baseline template for the deployment, please check here [https://factory.talos.dev/].
Greetings,
We've been playing with Talos Linux and Cluster API to automate the management of our clusters, and are currently facing some questions/issues around the bootstrap process using the vSphere infrastructure provider.
Versions / Environment
Description
According to the Talos - VMware documentation, we have to install a custom talos-vmtools with some dedicated Talos config.
This totally makes senses, however, my concern if the following:
In order to bootstrap the cluster via Cluster API, and especially the CACPPT controller, I need my CAPV controller to retrieve the IP address of the VM via the vCenter API. However, such IP is only available upon successful installation and configuration of the VMTools. Unfortunately, to install the VMTools, I need to necessarily have the Talos bootstrap done due to the fact that it is deployed as a DaemonSet. This makes us hit the chicken/egg problem.
Our current workaround is to manually bootstrap the cluster via the IP addresses provided by the DHCP. However, this is quite a pain as we wish to automate everything via GitOps since we will manage quite a lot of permanent clusters, but also some ephemeral ones.
Do you have any insights or recommendations to achieve such goal using the VMware ecosystem ?
Reproduce Steps
The following steps can be performed to easily reproduce the issue:
clusterctl
withCAPV
,CABPT
andCACPPT
Click to expand manifests
```yaml --- apiVersion: v1 kind: Secret metadata: name: observability-cluster-poc namespace: cluster-api-system stringData: password: REDACTED username: REDACTED --- apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3 kind: TalosConfigTemplate metadata: name: observability-cluster-poc-md-0 namespace: cluster-api-system spec: template: spec: configPatches: - op: add path: /machine/network value: interfaces: - dhcp: true dhcpOptions: routeMetric: 1 interface: eth0 - dhcp: true dhcpOptions: routeMetric: 10 interface: eth1 - op: add path: /machine/install value: extraKernelArgs: - net.ifnames=0 - op: add path: /cluster/network/cni value: name: none - op: add path: /cluster/proxy value: disabled: true - op: add path: /machine/features/kubePrism value: enabled: true port: 7445 - op: replace path: /cluster/controlPlane value: endpoint: https://172.30.11.10:6443 - op: add path: /machine/certSANs value: - 172.30.11.10 - op: add path: /machine/time value: disabled: false servers: - 172.30.110.1 - op: replace path: /cluster/extraManifests value: - https://raw.githubusercontent.com/mologie/talos-vmtoolsd/master/deploy/unstable.yaml - op: add path: /machine/kubelet/extraArgs value: cloud-provider: external generateType: worker --- apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster metadata: labels: cluster.x-k8s.io/cluster-name: observability-cluster-poc name: observability-cluster-poc namespace: cluster-api-system spec: controlPlaneRef: apiVersion: controlplane.cluster.x-k8s.io/v1alpha3 kind: TalosControlPlane name: observability-cluster-poc infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: VSphereCluster name: observability-cluster-poc --- apiVersion: cluster.x-k8s.io/v1beta1 kind: MachineDeployment metadata: labels: cluster.x-k8s.io/cluster-name: observability-cluster-poc name: observability-cluster-poc-md-0 namespace: cluster-api-system spec: clusterName: observability-cluster-poc replicas: 3 selector: matchLabels: {} strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 0 type: RollingUpdate template: metadata: labels: cluster.x-k8s.io/cluster-name: observability-cluster-poc spec: bootstrap: configRef: apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3 kind: TalosConfigTemplate name: observability-cluster-poc-md-0 clusterName: observability-cluster-poc infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: VSphereMachineTemplate name: observability-cluster-poc-worker version: v1.27.5 --- apiVersion: controlplane.cluster.x-k8s.io/v1alpha3 kind: TalosControlPlane metadata: name: observability-cluster-poc namespace: cluster-api-system spec: controlPlaneConfig: controlplane: configPatches: - op: add path: /machine/network value: interfaces: - dhcp: true dhcpOptions: routeMetric: 1 interface: eth0 vip: ip: 172.30.11.10 - dhcp: true dhcpOptions: routeMetric: 10 interface: eth1 - op: add path: /machine/install value: extraKernelArgs: - net.ifnames=0 - op: add path: /cluster/network/cni value: name: none - op: add path: /cluster/proxy value: disabled: true - op: add path: /machine/features/kubePrism value: enabled: true port: 7445 - op: replace path: /cluster/controlPlane value: endpoint: https://172.30.11.10:6443 - op: add path: /machine/certSANs value: - 172.30.11.10 - op: add path: /cluster/coreDNS value: disabled: true - op: add path: /machine/time value: disabled: false servers: - 172.30.110.1 - op: replace path: /cluster/extraManifests value: - https://raw.githubusercontent.com/mologie/talos-vmtoolsd/master/deploy/unstable.yaml - op: add path: /machine/kubelet/extraArgs value: cloud-provider: external generateType: controlplane infrastructureTemplate: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: VSphereMachineTemplate name: observability-cluster-poc replicas: 3 rolloutStrategy: rollingUpdate: maxSurge: 1 type: RollingUpdate version: v1.27.6 --- apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: VSphereCluster metadata: name: observability-cluster-poc namespace: cluster-api-system spec: controlPlaneEndpoint: host: 172.30.11.10 port: 6443 identityRef: kind: Secret name: observability-cluster-poc server: REDACTED thumbprint: REDACTED --- apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: VSphereMachineTemplate metadata: name: observability-cluster-poc namespace: cluster-api-system spec: template: spec: cloneMode: linkedClone customVMXKeys: disk.EnableUUID: "true" datacenter: REDACTED datastore: REDACTED diskGiB: 25 folder: cluster-api-vms memoryMiB: 8192 network: devices: - dhcp4: true dhcp4Overrides: routeMetric: 1 networkName: PLATFORM-PRODUCTION-OBSERVABILITY - dhcp4: true dhcp4Overrides: routeMetric: 10 networkName: PRODUCTION numCPUs: 2 os: Linux powerOffMode: hard resourcePool: Cluster-API-POC server: REDACTED storagePolicyName: "" tagIDs: - urn:vmomi:InventoryServiceTag:0fe8eb41-7a8f-47b3-a9fe-0d288ec787dd:GLOBAL - urn:vmomi:InventoryServiceTag:4495a9ce-727a-4814-b067-682b52130cad:GLOBAL template: talos-linux-1.5.2 thumbprint: REDACTED --- apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: VSphereMachineTemplate metadata: name: observability-cluster-poc-worker namespace: cluster-api-system spec: template: spec: cloneMode: linkedClone customVMXKeys: disk.EnableUUID: "true" datacenter: REDACTED datastore: REDACTED diskGiB: 25 folder: cluster-api-vms memoryMiB: 8192 network: devices: - dhcp4: true dhcp4Overrides: routeMetric: 1 networkName: PLATFORM-PRODUCTION-OBSERVABILITY - dhcp4: true dhcp4Overrides: routeMetric: 10 networkName: PRODUCTION numCPUs: 2 os: Linux powerOffMode: hard resourcePool: Cluster-API-POC server: REDACTED storagePolicyName: "" tagIDs: - urn:vmomi:InventoryServiceTag:0fe8eb41-7a8f-47b3-a9fe-0d288ec787dd:GLOBAL - urn:vmomi:InventoryServiceTag:4495a9ce-727a-4814-b067-682b52130cad:GLOBAL template: talos-linux-1.5.2 thumbprint: REDACTED ```Useful outputs/content
Talos console:
vSphere machine (no IP due to VMtools not being installable at this point in time):
CACPPT logs:
Thanks in advance for your help and insights.