open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.07k stars 2.37k forks source link

[SplunkHECExporter] Add ability to heartbeating destination #19220

Closed splunkericl closed 1 year ago

splunkericl commented 1 year ago

Is your feature request related to a problem? Please describe. In Edge Processor(EP), it is possible the EP of a customer would have very low traffic for some periods of time. EP adds the ability to heartbeat the destination periodically to check the validity of the destination. For example, the S3 exporter in EP periodically calls listObjects to validate the credentials are still valid.

One of the requirement to incorporate SplunkHECExporter into Edge Processor(EP) is to adds this heart-beating ability to SplunkHECExporter and adds observability data on top of it.

Describe the solution you'd like Add an configuration option to allow heartbeating in SplunkHECExporter. If enabled the exporter would spawn a go routine that periodically sends dummy info to the destination. The dummy info can be similar to how a splunk forwarder is sending:

Describe alternatives you've considered We can leverage the heartbeat information by checking how many events were sent successfully. However, this doesn't address the users that won't have data sent for a while.

splunkericl commented 1 year ago

I am picking up this work and hoping to get some guidance. I am thinking of two approaches for the part to emit observability data for the heartbeat:

@atoulme @dmitryax do you have any advices on the approaches?

dmitryax commented 1 year ago

The target audience of the contrib components is the end users. Let's keep it that way end expose only user-configurable options, not programmatic. So let's add the observability data right in the exporter.

The exporter already has health_check_enabled config option. It's currently triggered only during the start time, but I believe it can be enhanced to make such periodic calls and emit additional metrics based on that. If the option is disabled, which should be by default, no additional heartbeat observability metrics are emitted.

cc @wojtekzyla who introduced that option

atoulme commented 1 year ago

The health_check_enabled performs a HTTP check against /collector/event/health, and has a different function as it captures the health of the remote counterpart. Here, we want to send the status of the collector to the remote counterpart on a regular basis.

Agree on starting from the user config first.

I would recommend we look at two additional config entries under a heartbeat key with the following defaults:

  heartbeat:
    enabled: false
    interval: 30s

We generate the log message to send based off component.BuildInfo which is passed in when creating the exporter.

dmitryax commented 1 year ago

Here, we want to send the status of the collector to the remote counterpart on a regular basis.

Is that the case? IFAIU, the goal is still to validate the destination, according to @splunkericl, no?

EP adds the ability to heartbeat the destination periodically to check the validity of the destination

I'd like to reuse the configuration field for something serving the same purpose. It's better to avoid adding new fields if they are not strictly necessary to reduce the mental load on the end user

splunkericl commented 1 year ago

Yeah the destination still needs to be validated. So if I understand correctly, we can:

atoulme commented 1 year ago

You're using interchangeably healthcheck and heartbeat. This is creating confusion. These are not the same thing.

if it is 0 or not set, the health check will happen once during start up No, this would be a change of the behavior of the exporter. The exporter already has a health check configuration option on start.

splunkericl commented 1 year ago

There are some similarities between healthcheck and heartbeat. But if we want to define them clearly:

So if it makes sense, we can put the heartbeat functionality into the healthcheck to reduce the number of functionalities end user needs to understand.

atoulme commented 1 year ago

These are distinct concepts, and we'd want to toggle them on and off separately in my opinion. If anything, we will confuse users if we conflate those.

It doesn't look like heartbeat data is dummy data to me. It looks like diagnostic data sent to upstream so that operations can check that the collector is reporting in properly, even when no data is being sent.

With that, my definition of heartbeat is:

splunkericl commented 1 year ago

That is fair. I think this definition makes sense and we should separate the configuration between healthcheck and heartbeat then.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dmitryax commented 1 year ago

Closing as done