siderolabs / cluster-api-control-plane-provider-talos

A control plane provider for CAPI + Talos
Mozilla Public License 2.0
56 stars 19 forks source link

Talos ControlPlane fails to Bootstrap due to verifying certs #175

Closed mcbenjemaa closed 11 months ago

mcbenjemaa commented 11 months ago

Hey,

I'm trying to use cacppt as controlplane provider to bootrap cluster in Proxmox, I provided the nocloud image to the VM.

But when I apply my resources, cacppt is complaining about certifacates.

Logs

2023-08-16T14:09:49Z    INFO    controllers.TalosControlPlane   verifying etcd health on all nodes  {"node": "proxmox-test-control-plane-2lmlg"}
2023-08-16T14:09:49Z    INFO    controllers.TalosControlPlane   bootstrap failed, retrying in 20 seconds    {"namespace": "default", "talosControlPlane": "proxmox-test-control-plane", "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate signed by unknown authority\""}
2023-08-16T14:09:49Z    INFO    controllers.TalosControlPlane   attempting to set control plane status
2023-08-16T14:09:49Z    INFO    controllers.TalosControlPlane   failed to get kubeconfig for the cluster    {"error": "failed to create cluster accessor: error creating dynamic rest mapper for remote cluster \"default/proxmox-test\": Get \"https://10.10.10.35:6443/api?timeout=10s\": dial tcp 10.10.10.35:6443: connect: connection refused", "errorVerbose": "Get \"https://10.10.10.35:6443/api?timeout=10s\": dial tcp 10.10.10.35:6443: connect: connection refused\nerror creating dynamic rest mapper for remote cluster \"default/proxmox-test\"\nsigs.k8s.io/cluster-api/controllers/remote.(*ClusterCacheTracker).createClient\n\t/.cache/mod/sigs.k8s.io/cluster-api@v1.4.1/controllers/remote/cluster_cache_tracker.go:384\nsigs.k8s.io/cluster-api/controllers/remote.(*ClusterCacheTracker).newClusterAccessor\n\t/.cache/mod/sigs.k8s.io/cluster-api@v1.4.1/controllers/remote/cluster_cache_tracker.go:254\nsigs.k8s.io/cluster-api/controllers/remote.(*ClusterCacheTracker).getClusterAccessor\n\t/.cache/mod/sigs.k8s.io/cluster-api@v1.4.1/controllers/remote/cluster_cache_tracker.go:233\nsigs.k8s.io/cluster-api/controllers/remote.(*ClusterCacheTracker).GetClient\n\t/.cache/mod/sigs.k8s.io/cluster-api@v1.4.1/controllers/remote/cluster_cache_tracker.go:151\ngithub.com/siderolabs/cluster-api-control-plane-provider-talos/controllers.(*TalosControlPlaneReconciler).updateStatus\n\t/src/controllers/taloscontrolplane_controller.go:563\ngithub.com/siderolabs/cluster-api-control-plane-provider-talos/controllers.(*TalosControlPlaneReconciler).Reconcile.func1\n\t/src/controllers/taloscontrolplane_controller.go:156\ngithub.com/siderolabs/cluster-api-control-plane-provider-talos/controllers.(*TalosControlPlaneReconciler).Reconcile\n\t/src/controllers/taloscontrolplane_controller.go:185\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/.cache/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/.cache/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/.cache/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/.cache/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235\nruntime.goexit\n\t/toolchain/go/src/runtime/asm_amd64.s:1598\nfailed to create cluster accessor\nsigs.k8s.io/cluster-api/controllers/remote.(*ClusterCacheTracker).getClusterAccessor\n\t/.cache/mod/sigs.k8s.io/cluster-api@v1.4.1/controllers/remote/cluster_cache_tracker.go:235\nsigs.k8s.io/cluster-api/controllers/remote.(*ClusterCacheTracker).GetClient\n\t/.cache/mod/sigs.k8s.io/cluster-api@v1.4.1/controllers/remote/cluster_cache_tracker.go:151\ngithub.com/siderolabs/cluster-api-control-plane-provider-talos/controllers.(*TalosControlPlaneReconciler).updateStatus\n\t/src/controllers/taloscontrolplane_controller.go:563\ngithub.com/siderolabs/cluster-api-control-plane-provider-talos/controllers.(*TalosControlPlaneReconciler).Reconcile.func1\n\t/src/controllers/taloscontrolplane_controller.go:156\ngithub.com/siderolabs/cluster-api-control-plane-provider-talos/controllers.(*TalosControlPlaneReconciler).Reconcile\n\t/src/controllers/taloscontrolplane_controller.go:185\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/.cache/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/.cache/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/.cache/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/.cache/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235\nruntime.goexit\n\t/toolchain/go/src/runtime/asm_amd64.s:1598"}

Note, I'm setting a static ip via proxmox API.

but, this fails.

smira commented 11 months ago
 \"https://10.10.10.35:6443/api?timeout=10s\": dial tcp 10.10.10.35:6443: connect: connection refused"

control plane endpoint doesn't function

2023-08-16T14:09:49Z    INFO    controllers.TalosControlPlane   bootstrap failed, retrying in 20 seconds    {"namespace": "default", "talosControlPlane": "proxmox-test-control-plane", "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate signed by unknown authority\""}

your Talos instance most probably didn't get the machine config via user-data as it should in CAPI world.

mcbenjemaa commented 11 months ago

thanks for closing the issue

smira commented 11 months ago

There is nothing for the CACPPT here, as if the problem is, it's around either Talos or infrastructure provider.

mcbenjemaa commented 11 months ago

I tried to use metal-iso to set talos-config But it doesn't work.

I think the only way, if the control provider allow insecure via a flag or something

smira commented 11 months ago

This is not how CAPI works, infrastructure provider should submit already prepared machine config as user-data to the instance. There's no other way supported with CAPI atm.

mcbenjemaa commented 11 months ago

Can you provide an example of userdata for talos?

smira commented 11 months ago

It's the machine config. CAPI when working correctly does it for you, there's nothing related to this particular issue or repository.