open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.9k stars 2.27k forks source link

[extension/healthcheckv2] Ability to access the `status.Aggregator` from another extension #34692

Open blakerouse opened 1 month ago

blakerouse commented 1 month ago

Component(s)

extension/healthcheckv2

Is your feature request related to a problem? Please describe.

Currently the opampextension uses the original healthcheckextension and it determines if the collector is healthy by performing an HTTP request to its own process to determine if the collector is health. https://github.com/open-telemetry/opamp-go/blob/ed38d5f4bf930b57e04581919fbfb676aaa5a5af/internal/examples/supervisor/supervisor/supervisor.go#L435

This is not great from the standpoint of requiring the healthcheck endpoint to be exposed on the host just to get health of the current process that the opampextension is also running in. If you don't want any endpoint exposed then opampextension is unable to determine health. It is also very inefficient to determine status as it requires the a HTTP connection from its own process to the same process to just get status information. Another possible issue is IP table rules that prevent localhost port access prevent the ability to get this information (rare, but I have see it happen). It would be much better if this could be done in process.

It would be better if an extension could access the healthcheckv2extension internal status.Aggregator so it can determine the status of the collector without having to perform HTTP/GRPC requests and expose an endpoint on the host.

Describe the solution you'd like

It would be great if something like the following could be done from inside another extension to access the aggregator:

func (hc *exampleExtension) Start(ctx context.Context, host component.Host) error {
    extensions := host.GetExtensions()
    component, ok := extensions[component.MustNewID("healthcheckv2")]
    if !ok {
        return fmt.Errorf("failed to find healthcheckv2 extension; required dependent")
    }
    aggregatorAccess, ok := component.(healthcheckv2extension.AggregatorAccess)
    if !ok {
        return fmt.Errorf("failed to find healthcheckv2 extension; unable to convert to AggregatorAccess interface")
    }
    aggregator := aggregatorAccess.GetAggregator()
    // can now call aggregator.AggregateStatus or aggregator.Subscribe
    return nil
}

Another option would just to expose AggregateStatus and Subscribe directly on the healthcheckv2extension through an interface without exposing the entire function interface to the caller because RecordStatus should not really be called externally.

Describe alternatives you've considered

Another alternative would allow an extension to register a sub-component inside of the healthcheckv2extension. This would allow another extension to expose a different way of getting the health check information versus only HTTP and/or GRPC. This could be done using the same approach:

func (hc *exampleExtension) Start(ctx context.Context, host component.Host) error {
    extensions := host.GetExtensions()
    component, ok := extensions[component.MustNewID("healthcheckv2")]
    if !ok {
        return fmt.Errorf("failed to find healthcheckv2 extension; required dependent")
    }
    addComponent, ok := component.(healthcheckv2extension.AddSubcomponentInterface)
    if !ok {
        return fmt.Errorf("failed to find healthcheckv2 extension; unable to convert to AddSubcomponentInterface interface")
    }
        component := createNewSubComponent(ctx)
    return addComponent.AddComponent(component)
}

Downside of this approach is if the extension sets the healthcheckv2extension as a Dependent() then it will already be started (I believe) before Start(...) is called on the extension. The AddComponent function would then need to start the sub-component at the time of registering. I also believe this will require exposing more internal interfaces and types to extensions that might not be desired.

Additional context

No response

github-actions[bot] commented 1 month ago

Pinging code owners:

blakerouse commented 1 month ago

I would be interested in working on this once an agreement is made on the design and if this is something that is wanted.

jpkrohling commented 4 weeks ago

cc @mwear