Open swiatekm opened 2 hours ago
Pinging code owners:
processor/k8sattributes: @dmitryax @fatsheep9146 @TylerHelmuth
See Adding Labels via Comments if you do not have permissions to add labels yourself.
I'd like to work on this, so if there's no opposition to the idea itself, feel free to assign this to me.
Component(s)
processor/k8sattributes
Is your feature request related to a problem? Please describe.
The k8sattributes processor maintains a local cache of K8s resources it uses to compute metadata. For example, it has a local copy of data for a Pod, so when a log record from that Pod arrives, it can attach Pod metadata to the record. Depending on the configuration and use case, these caches can be quite large in terms of memory consumption, relative to the performance profile of the collector at large.
Currently, each processor instance maintains its own set of caches, even in situations where they could easily be shared. Thus, each instance comes with significant additional memory consumption. Even just having three separate pipelines for metrics, logs and traces, each with a k8sattributes processor, results in 3x the memory consumption.
Describe the solution you'd like
k8sattributes processor instances should share informers. An informer is a kind of local cache which actively keeps itself in sync with the Kubernetes API Server state. The k8sattributes processor already uses informers, which can very much be shared, and there's even tooling to facilitate this kind of sharing.
In terms of implementation, I think we should use a similar approach as memorylimiter processor, where the processor factory holds the shared state. As k8sattributes processors can have different filters set for their watched resources, we need to have separate informer factories for each set of filters, created on demand.
The biggest problem with this approach is managing the lifecycle of these informers. The tooling only allows us to shut them all down collectively, which may leave us with informers running unnecessarily until all k8sattributes processor instances are stopped. In practice, it should be fine to just have a simple counter of processor instances per informer factory and clean up when it reaches 0.
Describe alternatives you've considered
There isn't really any alternative to sharing informers if we want to solve this problem. We could try to build our own solution for this, but I'd strongly prefer the one from client-go unless there's a very good reason to pass on it. We can consider doing so if cleanup ends up becoming a major issue.
Additional context
I'm limiting myself to informer sharing here. The processor also holds a local cache of computed metadata. Instead of sharing that, I'd rather switch to computing it lazily rather than based on event notifications. That can happen in its own issue, though.