nforgeio / neonKUBE

Public NeonKUBE Kubernetes distribution related projects
https://neonkube.io
Apache License 2.0
76 stars 13 forks source link

Watcher anti-pattern (temporary watcher lives forever)? #1906

Open jefflill opened 2 months ago

jefflill commented 2 months ago

I'm seeing this pattern in a handful of places, where we're using a watcher to wait for a Kubernetes resource to be created:

// Start the watcher.

_ = K8s.WatchAsync<V1ConfigMap>(
    async (@event) =>
    {
        await SyncContext.Clear;

        ClusterInfo = Neon.K8s.TypedConfigMap<ClusterInfo>.From(@event.Value).Data;

        Logger.LogInformationEx("Updated cluster info");
    },
    namespaceParameter: KubeNamespace.NeonStatus,
    fieldSelector:      $"metadata.name={KubeConfigMapName.ClusterInfo}",
    retryDelay:         TimeSpan.FromSeconds(30),
    logger:             Logger);

// Wait for the watcher to discover the [ClusterInfo].

NeonHelper.WaitFor(() => ClusterInfo != null, timeout: TimeSpan.FromSeconds(60), timeoutMessage: "Timeout obtaining: cluster-info.");

The problem here is that the watcher is started but is never explicitly terminated, In theory, the watcher task will eventually be garbage collected sometime after the method containing this returns, but in real life GCs are delayed until the process experiences memory pressure which may take a while and might never happen if the process isn't doing much.

It doesn't make sense to keep connections open to the API Server unnecessarily, and I believe watchers will periodically reconnect, making this even worse.

NOTE: I'm not actually convinced that GC will actually stop the watcher though. Disposing a running Task actually throws an exception saying that only completed or failed tasks can be disposed, So I suspect this means that the watcher will live forever, even after GCs.

We should add an extension method that implements this logic and invokes a callback when the resource if discovered and then uses a CancellationToken to explicitly abort the watcher after the callback returns.

I've marked the places in the code where I've found this with a $todo(jefflill) and the link to this issue,