open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.07k stars 2.37k forks source link

Proposal to merge opentelemetry-log-collection #8850

Closed djaglowski closed 2 years ago

djaglowski commented 2 years ago

Background

opentelemetry-log-collection provides substantial functionality for several log receivers in the contrib collector. It exists as a separate repository based on the original proposal to contribute stanza to OpenTelemetry.

Since the contribution, the repository has been reshaped from a standalone agent into a simplified library containing only elements relevant to the OpenTelemetry Collector. Along the way, many enhancements have been made according to feedback from the community. As of this writing, the library's internal data model is in the final stages of an overhaul that will bring it into close alignment with the OpenTelemetry Log Data Model.

Motivation

As a maintainer of the log-collection repository, there have been some benefits to independent maintenance of the project. However, I believe we are nearly at the point where its existence as a standalone repository is both unnecessary and, from a development standpoint, inefficient.

Benefits of merging would include, at least:

Detailed Proposal

Development on the log-collection repository should be wound down. A final release, v0.29.0, should include all meaningful in-progress work. Therefore, any PRs that will not be included in v0.29.0 should be closed and captured as issues on the contrib repo. Likewise, any issues that will not be resolved in v0.29.0 should be moved to contrib. All log receivers using the log-collection library should be updated to v0.29.0, so that the switchover of the codebase will be an isolated action with no expected change to functionality.

A new module should be created at either pkg/stanza or pkg/logcollection. (I slightly prefer the former but defer to the community on whether it is appropriate to reflect the legacy of the codebase in this way.) All packages from the log-collection library should be copied into this new module and updated to reflect their new package paths. Log receivers dependent upon these packages should be updated accordingly. Likely later, internal/stanza should be colocated by moving it to an adapter package within the module.

Myself and @jsirianni should be listed as code owners of the new module.

Tooling dependency and CI workflow changes can be proposed on a case-by-case basis. I have performed an audit and found only a few very minor differences - nothing that cannot reasonably be ported or dropped.

The log-collection README should be updated to indicate that the standalone library has been deprecated and should provide a link to the new contrib module.

The log-collection repo should be archived in order to preserve development and release history.

jkowall commented 2 years ago

Parsing log data which is not generated from OpenTelemetry seems critical to me and to this contribution. I would like to see us get the schema incorporated before this happens: https://github.com/open-telemetry/oteps/pull/199 This makes the logging far more useful and relevant when it comes to log correlation.

djaglowski commented 2 years ago

Parsing log data which is not generated from OpenTelemetry seems critical to me and to this contribution.

This isn't proposing a contribution - only a change in code management. In any case, parsing log data which is not generated from OpenTelemetry is exactly what this code base does, aside from collecting log data. This capability is currently built into every log receiver that uses the log-collection library.

I would like to see us get the schema incorporated before this happens: open-telemetry/oteps#199 This makes the logging far more useful and relevant when it comes to log correlation.

As far as I can tell, your concern is orthogonal to this proposal. I 100% would like to see technology-specific parsing built into the collector, but there's no reason this should be a prerequisite for the generic capabilities that already exist. Can you elaborate on how this relates to whether the log collection code resides within the collector itself, vs being imported as a dependency that is separately managed within the OpenTelemetry project?

dmitryax commented 2 years ago

I support this proposal

bogdandrutu commented 2 years ago

+1

codeboten commented 2 years ago

This makes sense to me 👍

mx-psi commented 2 years ago

+1

Aneurysm9 commented 2 years ago

+1

Are there consumers outside of the collector contrib repo that we know of? Do we need to leave forwarding type aliases?

djaglowski commented 2 years ago

observIQ has some builds that use the log-collection repo, but pinned to older releases so no forwarding is necessary in this case. I'm fairly certain this is only outside consumer of the library.

jpkrohling commented 2 years ago

+1 also from me. Would you be open to demoing the log-collector during a SIG call?

JaredTan95 commented 2 years ago

+1 This makes sense to me

djaglowski commented 2 years ago

@jpkrohling Would you mind clarifying what you'd like to see in a demo? There's no longer a standalone executable to demo, so a user-facing demo could focus on configuring the existing log receivers, or I can explain how the log receivers work internally.

jpkrohling commented 2 years ago

or I can explain how the log receivers work internally

That's more like what I had in mind.

bogdandrutu commented 2 years ago

We missed the opportunity to do the demo (maybe later), but I think we should move forward with this. @djaglowski happy to review a PR.

djaglowski commented 2 years ago

@jpkrohling I'm happy to give a brief overview in next week's SIG.

djaglowski commented 2 years ago

@bogdandrutu I'm wrapping up a few more changes for the final release of log-collection, then will have a couple PRs to execute the merge.

djaglowski commented 2 years ago

The initial code transfer has been completed in #9922. Further steps will be tracked in #9960.