open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.87k stars 2.24k forks source link

RemoteTap Extension: a new start #34096

Open wildum opened 1 month ago

wildum commented 1 month ago

Component(s)

extension/remotetap

Is your feature request related to a problem? Please describe.

The RemoteTap extension was abandoned because the first implementation was not merged. The extension has been left in a skeleton state since its beginning 9 months ago.

The big advantage of this component compared to the remotetap processor is that the user does not need to modify the pipeline to see the data at any stages.

In Alloy (Grafana's Opentelemetry collector distribution), we implemented a similar feature called live debugging. Observability is not an easy topic and we believe this feature is a big step in making it more accessible.

Describe the solution you'd like

The extension could maintain a registry where components can register themselves. Additionally, it could expose an endpoint that allows users to select from these registered components.

Processors would register on start to the extension and publish data to it after every processing (some could publish the data also before if relevant). Receivers could also register to it and publish data before sending it to the next consumer.

Components would only publish data when a remotetap stream is open to prevent any unnecessary computing

Describe alternatives you've considered

No response

Additional context

I would be happy to use my experience with live debugging in Alloy to contribute to the implementation of this feature.

@atoulme

github-actions[bot] commented 1 month ago

Pinging code owners:

atoulme commented 1 month ago

The extension actually requires the remotetap processor to be deployed, and lists them in its configuration.

Whatever you can bring forward that helps is most welcome. Please feel free to expand on your proposal, given the component model of the collector. In particular, can you please describe the config yaml for this development?

wildum commented 1 month ago

I would like the extension not to depend on the remotetap processor with the idea to deprecate the remotetap processor once the extension is mature enough.

The config for the extension would only contain "confighttp.ServerConfig `mapstructure:",squash"" and would look like this in yaml:

extensions:
  remotetap:
    endpoint: localhost:12001

If you need to modify the config of a faulty collector to debug it, you run the risk to break it completely or to make the problem go away by reloading the config. Having the possibility to remote tap any processor/receiver of a running collector without any disruptions is a massive +.

I believe that with the following design, users might be happy to keep the extension even in prod environments because it's not directly part of their pipelines and it does not impact performances:

On start:

Let's say that the user wants to remote tap the component "metricsgeneration" processor:

The interface for the components to interact with the extension would be the following:

type RemoteTapPublisher interface {
        // Register the component to the RemoteTap extension
        Register(componentID)
    // IsActive returns true when at least one connection is open for the given componentID.
    IsActive(componentID ComponentID) bool
    // PublishMetrics sends metrics for a given componentID to the RemoteTap extension.
    PublishMetrics(componentID ComponentID, md pmetric.Metrics)
    // PublishTraces sends traces for a given componentID to the RemoteTap extension.
    PublishTraces(componentID ComponentID, td ptrace.Traces)
    // PublishLogs sends logs for a given componentID to the RemoteTap extension.
    PublishLogs(componentID ComponentID, ld plog.Logs)
    // PublishData sends data for a given componentID to the RemoteTap extension.
    PublishData(componentID ComponentID, data string)
}

The UI should contain some basic controls to make debugging easier:

What do you think? I could try a POC with the support for one processor and a very basic UI. If people are happy with it we could gradually extend it to more components and improve the UI.

jaronoff97 commented 4 weeks ago

@wildum I'd be happy to assist in reviewing and working on this. I am currently using the remotetap processor for exactly this:

In my tails project

djaglowski commented 4 weeks ago

In my opinion the ideal solution for this would be more deeply integrated into the collector so that individual component developers do not need to be concerned with managing it, and so that performance and correctness concerns are handled in a uniform way.

Roughly the following:

I would also point out that if we ever land https://github.com/open-telemetry/opentelemetry-collector/issues/9077, then this becomes a trivial problem where the solution is just adding one more exporter that subscribes to all data producers.

wildum commented 3 days ago

Hey @atoulme @djaglowski @jaronoff97, following the SIG meeting (and a week off), I worked on a 2nd POC. As discussed, this time I implemented this concept in the core repository using the processorhelper pkg. You can find the new POC here: https://github.com/open-telemetry/opentelemetry-collector/pull/10962 And a clean lightweight version of the concept here: https://github.com/open-telemetry/opentelemetry-collector/pull/10963

Please have a look at the concept branch when you have time and let me know what you think :)

wildum commented 2 days ago

Following @atoulme's comment, I moved the component back to contrib and kept only the changes related to the processor helper in core. @djaglowski @jaronoff97 Here are the new relevant links: