open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.1k stars 2.39k forks source link

[pkg/stanza] Windows Input Operator falls behind reading from channel #36491

Open dpaasman00 opened 16 hours ago

dpaasman00 commented 16 hours ago

Component(s)

pkg/stanza, receiver/windowseventlog

Describe the issue you're reporting

The windowseventlog receiver has a configuration parameter max_reads which determines the max number of events read from the event channel in a poll interval. In cases where the number of events being added to the channel in a poll interval is greater than max_reads the receiver can fall behind. In drastic situations the agent call fall behind severely, which was the case in #36472. In this situation, it's not clear the receiver is falling behind and that's why the newest events aren't being read from the channel.

I'm proposing adding some sort of mechanism for determining when the receiver is maxing out the number of events it can read from the channel. Maybe logging a debug log every time the number of events returned by evtNext() is equal to max_reads in this section of code. Or defining a monotonic cumulative sum metric that gets incremented every time this occurs, instead of a debug log.

Regardless of the mechanism, a way for the receiver to indicate it may be falling behind reading from an event log channel would go a long way in trouble shooting situations where it seems like the receiver is failing.

github-actions[bot] commented 16 hours ago

Pinging code owners: