siderolabs / sidero

Sidero Metal is a bare metal provisioning system with support for Kubernetes Cluster API.
https://www.sidero.dev
Mozilla Public License 2.0
404 stars 62 forks source link

Sidero fails to bootstrap machines #1280

Open rmvangun opened 8 months ago

rmvangun commented 8 months ago

I've gone through the bootstrap documentation with the recent set of versions. I'm pretty familiar with the setup and have gotten this to work on previous versions a couple years back (2021/22).

My servers are joining Sidero, and are assigned to a machine, but they fail to bootstrap. If I run talosctl bootstrap, they bootstrap just fine.

$ talosctl dmesg
192.168.2.0: user: warning: [2024-01-06T05:21:41.708073187Z]: [talos] [192.168.2.0 fd0f:5fab:4685:1403:206f:1cfc:8fbd:94aa]
192.168.2.0: user: warning: [2024-01-06T05:21:50.716526187Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: an error on the server (\x5c"Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)\x5c") has prevented the request from succeeding"}
192.168.2.0: user: warning: [2024-01-06T05:21:50.758157187Z]: [talos] controller failed {"component": "controller-runtime", "controller": "v1alpha1.EventsSinkController", "error": "error publishing event: rpc error: code = Unavailable desc = connection error: desc = \x5c"transport: Error while dialing: dial tcp [fd0f:5fab:4685:1403::1]:4002: i/o timeout\x5c""}
192.168.2.0: user: warning: [2024-01-06T05:21:52.068440187Z]: [talos] task startAllServices (1/1): service "etcd" to be "up"
192.168.2.0: user: warning: [2024-01-06T05:22:06.059081187Z]: [talos] configuring siderolink connection {"component": "controller-runtime", "controller": "siderolink.ManagerController", "peer_endpoint": "10.5.10.1:51821", "next_peer_endpoint": ""}
192.168.2.0: user: warning: [2024-01-06T05:22:06.060807187Z]: [talos] siderolink connection configured {"component": "controller-runtime", "controller": "siderolink.ManagerController", "endpoint": "192.168.2.85:8081", "node_uuid": "c9fb49f7-1d99-9b85-31d8-1c697aaecb53", "node_address": "fd0f:5fab:4685:1403:206f:1cfc:8fbd:94aa/64"}
192.168.2.0: user: warning: [2024-01-06T05:22:06.260625187Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: an error on the server (\x5c"Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)\x5c") has prevented the request from succeeding"}
192.168.2.0: user: warning: [2024-01-06T05:22:07.067765187Z]: [talos] task startAllServices (1/1): service "etcd" to be "up"
192.168.2.0: user: warning: [2024-01-06T05:22:15.032401187Z]: [talos] etcd is waiting to join the cluster, if this node is the first node in the cluster, please run `talosctl bootstrap` against one of the following IPs:
192.168.2.0: user: warning: [2024-01-06T05:22:15.033848187Z]: [talos] [192.168.2.0 fd0f:5fab:4685:1403:206f:1cfc:8fbd:94aa]
192.168.2.0: user: warning: [2024-01-06T05:22:21.536275187Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: an error on the server (\x5c"Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)\x5c") has prevented the request from succeeding"}

cabpt logs:

I0106 05:00:32.685029       1 leaderelection.go:250] attempting to acquire leader lease manager/controller-leader-election-cabpt...
I0106 05:00:32.687424       1 leaderelection.go:260] successfully acquired lease manager/controller-leader-election-cabpt
[controller-runtime] log.SetLogger(...) was never called; logs will not be displayed.
Detected at:
     >  goroutine 121 [running]:
     >  runtime/debug.Stack()
     >   /toolchain/go/src/runtime/debug/stack.go:24 +0x64
     >  sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
     >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/log/log.go:60 +0xf4
     >  sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithValues(0x4000994300, {0x4000427c00, 0x2, 0x2})
     >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/log/deleg.go:168 +0x3c
     >  github.com/go-logr/logr.Logger.WithValues({{0x1d70f40, 0x4000994300}, 0x0}, {0x4000427c00?, 0x0?, 0x0?})
     >   /.cache/mod/github.com/go-logr/logr@v1.3.0/logr.go:336 +0x48
     >  sigs.k8s.io/cluster-api/util/predicates.ResourceNotPausedAndHasFilterLabel.All.func10({{0x1d8abd0?, 0x4000a9d380?}})
     >   /.cache/mod/sigs.k8s.io/cluster-api@v1.6.0/util/predicates/generic_predicates.go:46 +0xa4
     >  sigs.k8s.io/controller-runtime/pkg/predicate.Funcs.Create(...)
     >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/predicate/predicate.go:72
     >  sigs.k8s.io/controller-runtime/pkg/internal/source.(*EventHandler).OnAdd(0x40001f6460, {0x19dd980?, 0x4000a9d380})
     >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/source/event_handler.go:80 +0x1cc
     >  k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(...)
     >   /.cache/mod/k8s.io/client-go@v0.28.4/tools/cache/controller.go:239
     >  k8s.io/client-go/tools/cache.(*processorListener).run.func1()
     >   /.cache/mod/k8s.io/client-go@v0.28.4/tools/cache/shared_informer.go:974 +0x154
     >  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?)
     >   /.cache/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/backoff.go:226 +0x40
     >  k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x400010e728?, {0x1d4d3e0, 0x40009b8090}, 0x1, 0x40009c0120)
     >   /.cache/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/backoff.go:227 +0x90
     >  k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
     >   /.cache/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/backoff.go:204 +0x80
     >  k8s.io/apimachinery/pkg/util/wait.Until(...)
     >   /.cache/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/backoff.go:161
     >  k8s.io/client-go/tools/cache.(*processorListener).run(0x40009e8000)
     >   /.cache/mod/k8s.io/client-go@v0.28.4/tools/cache/shared_informer.go:968 +0x68
     >  k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
     >   /.cache/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/wait.go:72 +0x58
     >  created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 223
     >   /.cache/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/wait.go:70 +0x7c 

Provider: Docker on MacOS Sonoma Versions:

wim-de-groot commented 8 months ago

I do have the same issue. Maybe this is more of an cluster api issue?