[pkg/stanza] Prevent data loss due to `LogEmitter`'s buffer

andrzej-stencel commented 1 month ago

Component(s)

pkg/stanza

Is your feature request related to a problem? Please describe.

This issue is created as a result of discussion in https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31074.

During an non-graceful shutdown of the collector, logs kept in the LogEmitter's buffer (the batch field) are lost. This is because they are not persisted and are already marked as sent by the File consumer.

Describe the solution you'd like

Remove the buffer in the LogEmitter and make LogEmitter synchronously emit every log/batch of logs received down the collector pipeline.

This will likely introduce a performance impact if implemented without other changes. This should only be done after:

Measuring the possible performance impact: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/35454
Possibly implementing buffering earlier in the Stanza pipeline (in File consumer in case of Filelog receiver): https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/35455
Considering buffering earlier in the Stanza pipeline for receivers other than those relying on file consumer (especially Windows Event Log receiver, Syslog receiver) - issues to be created after validating the path in the two issues above

Describe alternatives you've considered

Instead of removing the buffering, have it persisted so that the data is not lost during non-graceful shutdown.
Change the logic of marking logs as sent, so that logs are only marked as sent when they are actually successfully sent out to next consumer in the collector pipeline.

Additional context

See discussion in https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31074#issuecomment-2360284799 and further comments.

github-actions[bot] commented 1 month ago

Pinging code owners:

pkg/stanza: @djaglowski

See Adding Labels via Comments if you do not have permissions to add labels yourself.

andrzej-stencel commented 1 month ago

Removing needs triage label as this was discussed with the code owner.

open-telemetry / opentelemetry-collector-contrib