open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.96k stars 2.3k forks source link

[pkg/stanza] Make Stanza adapter synchronuous #35453

Open andrzej-stencel opened 6 days ago

andrzej-stencel commented 6 days ago

Component(s)

pkg/stanza

Is your feature request related to a problem? Please describe.

This issue is created as a result of discussion in #31074. From https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31074#issuecomment-2360284799:

In my view, the current implementation of the Stanza adapter with multiple channels and asynchronous workers if overly complex without delivering a lot of value, instead creating the possibility of data loss.

and:

The conversion [from Stanza entries to OTLP log records] is currently pretty complex, involving multiple asynchronous workers and a bunch of channels.

Describe the solution you'd like

Remove the channels and workers by having LogEmitter convert Stanza entries to log records and call ConsumeLogs synchronously. Leave the 100-log buffering in place for this change, to minimize the impact. Compare benchmarks.

Describe alternatives you've considered

Leaving the 100-log buffering in place leaves the issue of losing logs still not completely resolved. Ideally we want to get rid of this buffering, but we need to step cautiously, as this may have serious performance impact. We should measure this impact and possibly explore possibilities for Stanza receivers like Filelog receiver / File consumer to emit entries in batches, not one by one. Implementing this would alleviate the possible performance impact that removing of batching in LogEmitter may introduce.

Additional context

See https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31074

github-actions[bot] commented 6 days ago

Pinging code owners:

andrzej-stencel commented 2 days ago

Removing needs triage label as this was discussed with the code owner.