siderolabs / cluster-api-bootstrap-provider-talos

A cluster-api bootstrap provider for deploying Talos clusters.
https://www.talos-systems.com
Mozilla Public License 2.0
103 stars 27 forks source link

Potential regression - CABPT & CACPPT broken logs #185

Closed julien-sugg closed 7 months ago

julien-sugg commented 7 months ago

Greetings,

We recently updated to CABPT & CACPPT to the latest version in order to support Talos 1.6 and appear to have faced a regression around logging.

Versions / Environment

Description

Since the update of CABPT from 0.6.2 to 0.6.3 & CACPPT from 0.5.3 to 0.5.4, we faced a regression in logs with startup errors. Note that those errors are non-blocking and the operator is still processing the underlying CRs; however, we are completely blind from a logging perspective.

CABPT Logs using version 0.6.3:

➜ k logs cabpt-controller-manager-56ccd6f4b6-5fxdb
I0118 15:16:44.627464       1 leaderelection.go:250] attempting to acquire leader lease cluster-api/controller-leader-election-cabpt...
I0118 15:17:00.697502       1 leaderelection.go:260] successfully acquired lease cluster-api/controller-leader-election-cabpt
[controller-runtime] log.SetLogger(...) was never called; logs will not be displayed.
Detected at:
    >  goroutine 221 [running]:
    >  runtime/debug.Stack()
    >   /toolchain/go/src/runtime/debug/stack.go:24 +0x5e
    >  sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/log/log.go:60 +0xcd
    >  sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithValues(0xc0007d1100, {0xc00032d5a0, 0x2, 0x2})
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/log/deleg.go:168 +0x49
    >  github.com/go-logr/logr.Logger.WithValues({{0x2297380, 0xc0007d1100}, 0x0}, {0xc00032d5a0?, 0x2278d10?, 0xc0002e42d0?})
    >   /.cache/mod/github.com/go-logr/logr@v1.3.0/logr.go:336 +0x42
    >  sigs.k8s.io/cluster-api/util/predicates.ResourceNotPausedAndHasFilterLabel.All.func9({{0x22b16e0?, 0xc000250f00?}, {0x22b16e0?, 0xc0004f2780?}})
    >   /.cache/mod/sigs.k8s.io/cluster-api@v1.6.0/util/predicates/generic_predicates.go:35 +0xd6
    >  sigs.k8s.io/controller-runtime/pkg/predicate.Funcs.Update(...)
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/predicate/predicate.go:88
    >  sigs.k8s.io/controller-runtime/pkg/internal/source.(*EventHandler).OnUpdate(0xc0003223c0, {0x1efe3e0?, 0xc000250f00}, {0x1efe3e0?, 0xc0004f2780})
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/source/event_handler.go:113 +0x3d5
    >  k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
    >   /.cache/mod/k8s.io/client-go@v0.28.4/tools/cache/controller.go:246
    >  k8s.io/client-go/tools/cache.(*processorListener).run.func1()
    >   /.cache/mod/k8s.io/client-go@v0.28.4/tools/cache/shared_informer.go:972 +0xea
    >  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
    >   /.cache/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/backoff.go:226 +0x33
    >  k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00075ef38?, {0x2273900, 0xc000b0d5c0}, 0x1, 0xc0006b9aa0)
    >   /.cache/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/backoff.go:227 +0xaf
    >  k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x44704593c5fcacce?, 0x3b9aca00, 0x0, 0xfd?, 0x3095bf6e157f56?)
    >   /.cache/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/backoff.go:204 +0x7f
    >  k8s.io/apimachinery/pkg/util/wait.Until(...)
    >   /.cache/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/backoff.go:161
    >  k8s.io/client-go/tools/cache.(*processorListener).run(0xc000ae3a70)
    >   /.cache/mod/k8s.io/client-go@v0.28.4/tools/cache/shared_informer.go:968 +0x69
    >  k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
    >   /.cache/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/wait.go:72 +0x4f
    >  created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 199
    >   /.cache/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/wait.go:70 +0x73

Image: ghcr.io/siderolabs/cluster-api-talos-controller:v0.6.3

CACPPT Logs using version 0.5.4:

➜ k logs cacppt-controller-manager-54bc8795cb-pppc2
I0118 14:45:37.740051       1 leaderelection.go:250] attempting to acquire leader lease cluster-api/controller-leader-election-cacppt...
I0118 14:45:54.090740       1 leaderelection.go:260] successfully acquired lease cluster-api/controller-leader-election-cacppt
[controller-runtime] log.SetLogger(...) was never called; logs will not be displayed.
Detected at:
    >  goroutine 274 [running]:
    >  runtime/debug.Stack()
    >   /toolchain/go/src/runtime/debug/stack.go:24 +0x5e
    >  sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/log/log.go:60 +0xcd
    >  sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithValues(0xc000879ec0, {0xc00131e6c0, 0x2, 0x2})
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/log/deleg.go:168 +0x49
    >  github.com/go-logr/logr.Logger.WithValues(...)
    >   /.cache/mod/github.com/go-logr/logr@v1.3.0/logr.go:336
    >  sigs.k8s.io/controller-runtime/pkg/builder.(*Builder).doController.func1(0xc00131e6a0)
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/builder/controller.go:400 +0x173
    >  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00090c820, {0x24a2c38, 0xc000908a00}, {0x1ef2ae0?, 0xc00131e680?})
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:306 +0x16a
    >  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00090c820, {0x24a2c38, 0xc000908a00})
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266 +0x1c9
    >  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227 +0x79
    >  created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 199
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:223 +0x565

Image: ghcr.io/siderolabs/cluster-api-control-plane-talos-controller:v0.5.4

Logs were fully operational using the previous patch version:

CACPPT Logs using version 0.5.3:

➜ k logs cacppt-controller-manager-7dd8ffb77b-kz5bb 
2024-01-18T13:16:49Z    INFO    controller-runtime.metrics  Metrics server is starting to listen    {"addr": "127.0.0.1:8080"}
2024-01-18T13:16:49Z    INFO    controller-runtime.builder  Registering a mutating webhook  {"GVK": "controlplane.cluster.x-k8s.io/v1alpha3, Kind=TalosControlPlane", "path": "/mutate-controlplane-cluster-x-k8s-io-v1alpha3-taloscontrolplane"}
2024-01-18T13:16:49Z    INFO    controller-runtime.webhook  Registering webhook {"path": "/mutate-controlplane-cluster-x-k8s-io-v1alpha3-taloscontrolplane"}
2024-01-18T13:16:49Z    INFO    controller-runtime.builder  Registering a validating webhook    {"GVK": "controlplane.cluster.x-k8s.io/v1alpha3, Kind=TalosControlPlane", "path": "/validate-controlplane-cluster-x-k8s-io-v1alpha3-taloscontrolplane"}
2024-01-18T13:16:49Z    INFO    controller-runtime.webhook  Registering webhook {"path": "/validate-controlplane-cluster-x-k8s-io-v1alpha3-taloscontrolplane"}
2024-01-18T13:16:49Z    INFO    setup   starting manager
2024-01-18T13:16:49Z    INFO    controller-runtime.webhook.webhooks Starting webhook server
2024-01-18T13:16:49Z    INFO    controller-runtime.certwatcher  Updated current TLS certificate
I0118 13:16:49.996684       1 leaderelection.go:245] attempting to acquire leader lease cluster-api/controller-leader-election-cacppt...
2024-01-18T13:16:49Z    INFO    controller-runtime.webhook  Serving webhook server  {"host": "", "port": 9443}
2024-01-18T13:16:49Z    INFO    starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
2024-01-18T13:16:49Z    INFO    controller-runtime.certwatcher  Starting certificate watcher
I0118 13:16:50.006088       1 leaderelection.go:255] successfully acquired lease cluster-api/controller-leader-election-cacppt
2024-01-18T13:16:50Z    DEBUG   events  cacppt-controller-manager-7dd8ffb77b-kz5bb_36a2d901-7f16-4d2a-bb85-5ec5af4c6abc became leader   {"type": "Normal", "object": {"kind":"Lease","namespace":"cluster-api","name":"controller-leader-election-cacppt","uid":"0a9f03bd-e17a-4816-a0c1-e8e91d77bf40","apiVersion":"coordination.k8s.io/v1","resourceVersion":"468786"}, "reason": "LeaderElection"}
2024-01-18T13:16:50Z    INFO    Starting EventSource    {"controller": "remote/clustercache", "controllerGroup": "cluster.x-k8s.io", "controllerKind": "Cluster", "source": "kind source: *v1beta1.Cluster"}
2024-01-18T13:16:50Z    INFO    Starting Controller {"controller": "remote/clustercache", "controllerGroup": "cluster.x-k8s.io", "controllerKind": "Cluster"}
2024-01-18T13:16:50Z    INFO    Starting EventSource    {"controller": "taloscontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "TalosControlPlane", "source": "kind source: *v1alpha3.TalosControlPlane"}
2024-01-18T13:16:50Z    INFO    Starting EventSource    {"controller": "taloscontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "TalosControlPlane", "source": "kind source: *v1beta1.Machine"}
2024-01-18T13:16:50Z    INFO    Starting EventSource    {"controller": "taloscontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "TalosControlPlane", "source": "kind source: *v1beta1.Cluster"}
2024-01-18T13:16:50Z    INFO    Starting Controller {"controller": "taloscontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "TalosControlPlane"}
2024-01-18T13:16:50Z    INFO    Starting workers    {"controller": "remote/clustercache", "controllerGroup": "cluster.x-k8s.io", "controllerKind": "Cluster", "worker count": 10}
2024-01-18T13:16:50Z    INFO    Starting workers    {"controller": "taloscontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "TalosControlPlane", "worker count": 10}

Image: ghcr.io/siderolabs/cluster-api-control-plane-talos-controller:v0.5.3

Reproduces Steps

The issue can be easily reproduced via the following steps:

  1. Create a transient cluster

The cluster can either be created directly on vSphere or kind/k3d/...

➜ k3d cluster create dbg -i rancher/k3s:v1.27.5-k3s1        
  1. Initialize Cluster API components on the transient cluster with clusterctl using CAPV, CABPT and CACPPT providers
➜ clusterctl init \
--infrastructure vsphere:v1.8.6 \
--bootstrap talos:v0.6.3 \
--control-plane talos:v0.5.4 \
--target-namespace cluster-api-system
  1. Apply basic CAPI manifests

  2. Check logs

➜ k logs deploy/cabpt-controller-manager
I0119 10:48:03.042598       1 leaderelection.go:250] attempting to acquire leader lease cluster-api-system/controller-leader-election-cabpt...
I0119 10:48:03.046279       1 leaderelection.go:260] successfully acquired lease cluster-api-system/controller-leader-election-cabpt
[controller-runtime] log.SetLogger(...) was never called; logs will not be displayed.
Detected at:
    >  goroutine 207 [running]:
    >  runtime/debug.Stack()
    >   /toolchain/go/src/runtime/debug/stack.go:24 +0x5e
    >  sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/log/log.go:60 +0xcd
    >  sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithValues(0xc00005f400, {0xc0004db4e0, 0x2, 0x2})
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/log/deleg.go:168 +0x49
    >  github.com/go-logr/logr.Logger.WithValues({{0x2297380, 0xc00005f400}, 0x0}, {0xc0004db4e0?, 0x1f58710?, 0x7?})
    >   /.cache/mod/github.com/go-logr/logr@v1.3.0/logr.go:336 +0x42
    >  sigs.k8s.io/cluster-api/util/predicates.ResourceNotPausedAndHasFilterLabel.All.func10({{0x22b14b0?, 0xc00074a340?}})
    >   /.cache/mod/sigs.k8s.io/cluster-api@v1.6.0/util/predicates/generic_predicates.go:46 +0xb6
    >  sigs.k8s.io/controller-runtime/pkg/predicate.Funcs.Create(...)
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/predicate/predicate.go:72
    >  sigs.k8s.io/controller-runtime/pkg/internal/source.(*EventHandler).OnAdd(0xc000420140, {0x1f03cc0?, 0xc00074a340})
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/source/event_handler.go:80 +0x225
    >  k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(...)
    >   /.cache/mod/k8s.io/client-go@v0.28.4/tools/cache/controller.go:239
    >  k8s.io/client-go/tools/cache.(*processorListener).run.func1()
    >   /.cache/mod/k8s.io/client-go@v0.28.4/tools/cache/shared_informer.go:974 +0x13e
    >  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
    >   /.cache/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/backoff.go:226 +0x33
    >  k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00012af38?, {0x2273900, 0xc0006921b0}, 0x1, 0xc00042c300)
    >   /.cache/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/backoff.go:227 +0xaf
    >  k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000583d40?, 0x3b9aca00, 0x0, 0x28?, 0xc000445ea0?)
    >   /.cache/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/backoff.go:204 +0x7f
    >  k8s.io/apimachinery/pkg/util/wait.Until(...)
    >   /.cache/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/backoff.go:161
    >  k8s.io/client-go/tools/cache.(*processorListener).run(0xc00087e090)
    >   /.cache/mod/k8s.io/client-go@v0.28.4/tools/cache/shared_informer.go:968 +0x69
    >  k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
    >   /.cache/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/wait.go:72 +0x4f
    >  created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 189
    >   /.cache/mod/k8s.io/apimachinery@v0.28.4/pkg/util/wait/wait.go:70 +0x73

➜ k logs deploy/cacppt-controller-manager
I0119 10:48:06.525315       1 leaderelection.go:250] attempting to acquire leader lease cluster-api-system/controller-leader-election-cacppt...
I0119 10:48:06.529032       1 leaderelection.go:260] successfully acquired lease cluster-api-system/controller-leader-election-cacppt
[controller-runtime] log.SetLogger(...) was never called; logs will not be displayed.
Detected at:
    >  goroutine 282 [running]:
    >  runtime/debug.Stack()
    >   /toolchain/go/src/runtime/debug/stack.go:24 +0x5e
    >  sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/log/log.go:60 +0xcd
    >  sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithValues(0xc0001c7b40, {0xc00042e520, 0x2, 0x2})
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/log/deleg.go:168 +0x49
    >  github.com/go-logr/logr.Logger.WithValues(...)
    >   /.cache/mod/github.com/go-logr/logr@v1.3.0/logr.go:336
    >  sigs.k8s.io/controller-runtime/pkg/builder.(*Builder).doController.func1(0xc00042e4c0)
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/builder/controller.go:400 +0x173
    >  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0004d9900, {0x24a2c38, 0xc0003ce6e0}, {0x1ef2ae0?, 0xc00042e480?})
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:306 +0x16a
    >  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0004d9900, {0x24a2c38, 0xc0003ce6e0})
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266 +0x1c9
    >  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227 +0x79
    >  created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 195
    >   /.cache/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:223 +0x565
smira commented 7 months ago

Thanks for reporting it, it's indeed a bug.

smira commented 7 months ago

New versions of CAPI components released fixing this bug.