neondatabase / autoscaling

Postgres vertical autoscaling in k8s
Apache License 2.0
142 stars 16 forks source link

Bug: neonvm-controller (still) has log sampling enabled #962

Closed sharnoff closed 3 weeks ago

sharnoff commented 3 weeks ago

Environment

Prod (and, local development with a bit of effort)

Steps to reproduce

Modify neonvm-controller so it produces a bunch of logs all at once. I used the following patch:

diff --git a/neonvm/controllers/vm_controller.go b/neonvm/controllers/vm_controller.go
index e9ebfcf..cfc81de 100644
--- a/neonvm/controllers/vm_controller.go
+++ b/neonvm/controllers/vm_controller.go
@@ -30,6 +30,7 @@ import (
    "os"
    "reflect"
    "strconv"
+   "sync"
    "time"

    nadapiv1 "github.com/k8snetworkplumbingwg/network-attachment-definition-client/pkg/apis/k8s.cni.cncf.io/v1"
@@ -102,6 +103,8 @@ type VMReconciler struct {
 //+kubebuilder:rbac:groups=vm.neon.tech,resources=ippools/finalizers,verbs=update
 //+kubebuilder:rbac:groups=k8s.cni.cncf.io,resources=network-attachment-definitions,verbs=get;list;watch

+var logABunch = sync.Once{}
+
 // Reconcile is part of the main kubernetes reconciliation loop which aims to
 // move the current state of the cluster closer to the desired state.

@@ -117,6 +120,12 @@ type VMReconciler struct {
 func (r *VMReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := log.FromContext(ctx)

+   logABunch.Do(func() {
+       for i := 0; i < 300; i += 1 {
+           log.Info("Test log sampling", "i", i)
+       }
+   })
+
    var vm vmv1.VirtualMachine
    if err := r.Get(ctx, req.NamespacedName, &vm); err != nil {
        // Error reading the object - requeue the request.

Expected result

There should be 300 log lines from the example above, with all values of i represented.

Actual result

We only get 102 log lines - i.e. 100 + (300-100)/100 from the sampling settings.

More specifically, we get the following sequence of log lines:

{"level":"info","ts":"2024-06-14T22:20:06.177Z","msg":"Test log sampling","controller":"virtualmachine","controllerGroup":"vm.neon.tech","controllerKind":"VirtualMachine","VirtualMachine":{"name":"postgres16-disk-test","namespace":"default"},"namespace":"default","name":"postgres16-disk-test","reconcileID":"5c8a2d5f-3e6a-4a98-9042-161cec15e23c","i":97}
{"level":"info","ts":"2024-06-14T22:20:06.177Z","msg":"Test log sampling","controller":"virtualmachine","controllerGroup":"vm.neon.tech","controllerKind":"VirtualMachine","VirtualMachine":{"name":"postgres16-disk-test","namespace":"default"},"namespace":"default","name":"postgres16-disk-test","reconcileID":"5c8a2d5f-3e6a-4a98-9042-161cec15e23c","i":98}
{"level":"info","ts":"2024-06-14T22:20:06.177Z","msg":"Test log sampling","controller":"virtualmachine","controllerGroup":"vm.neon.tech","controllerKind":"VirtualMachine","VirtualMachine":{"name":"postgres16-disk-test","namespace":"default"},"namespace":"default","name":"postgres16-disk-test","reconcileID":"5c8a2d5f-3e6a-4a98-9042-161cec15e23c","i":99}
{"level":"info","ts":"2024-06-14T22:20:06.177Z","msg":"Test log sampling","controller":"virtualmachine","controllerGroup":"vm.neon.tech","controllerKind":"VirtualMachine","VirtualMachine":{"name":"postgres16-disk-test","namespace":"default"},"namespace":"default","name":"postgres16-disk-test","reconcileID":"5c8a2d5f-3e6a-4a98-9042-161cec15e23c","i":199}
{"level":"info","ts":"2024-06-14T22:20:06.178Z","msg":"Test log sampling","controller":"virtualmachine","controllerGroup":"vm.neon.tech","controllerKind":"VirtualMachine","VirtualMachine":{"name":"postgres16-disk-test","namespace":"default"},"namespace":"default","name":"postgres16-disk-test","reconcileID":"5c8a2d5f-3e6a-4a98-9042-161cec15e23c","i":299}

Other logs, links

I though this would be fixed by

Turns out that's not the case. See that PR for more detail.