opensearch-project / data-prepper

OpenSearch Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
263 stars 203 forks source link

Consistent metric naming convention #3051

Open dlvenable opened 1 year ago

dlvenable commented 1 year ago

Background

Data Prepper has an existing metric naming convention of:

{pipelineName}.{pluginType}.{metricName}

The {metricName} part is custom to each plugin. The {pipelineName}.{pluginType} is determined by the PluginMetrics class and is standard to all Data Prepper metrics.

Problems

There are a few problems with this convention:

  1. Some plugin types have the same name for different component types. There is an s3 source and sink now. And there will be a kafka source and sink as well.
  2. Having multiple plugins of the same type does not distinguish between the metrics.

Proposal

Update our plugin metric naming. I'd like to suggest that we use consistent names for any plugin type. And then we would use tags/dimensions to disambiguate pipelines and pluginIds.

New metric name:

{componentType}.{metricName}

New tags per metric:

pipelineName={pipelineName}
pluginId={pluginId}

ThecomponentType is the type of pipeline component represented. This would be source, sink, processor, or buffer.

The pluginId is the plugin Id which would be added by #1025.

Expanded Metric Proposal

Also, we currently disallow pipelines to be named core or data-prepper in order to reserve this.

Thus, all Data Prepper metrics will have the following form.

{scopeIdentifier}.{metrics}

The scopeIdentifier can be one of the following:

If the scopeIdentifier is a pipeline component, then the plugin metric convention above applies. For core and data-prepper, the plugin metric convention does not apply and it will depend on the specific metrics.

Migration

This new plugin metric definition is a breaking change. Thus, we can offer a flag to enable these metrics and remove it a major version bump.

In data-prepper-core.yaml, provide a new property named metric_naming. It will have two options:

metric_naming: v2

Dependencies

Tasks

dlvenable commented 10 months ago

Rather than continuing to include the pipelineName in the metric name, I think making use of Micrometer tags would allow for better metrics. This would map to Amazon CloudWatch metrics and dimensions.

Metric name:

{componentType}.{metricName}

New tags:

pipelineName={pipelineName}
pluginId={pluginId}

Example:

my-pipeline1:
  ...
  sink:
    - opensearch:
    - opensearch:

...

my-pipeline2:
  ...
  sink:
    - opensearch:

Here are some of the metrics:

Metric Name tag: pipelineName tag: pluginId
opensearch.documentsSuccess my-pipeline1 opensearch
opensearch.documentsSuccess my-pipeline1 opensearch2
opensearch.bulkRequestFailed my-pipeline1 opensearch
opensearch.bulkRequestFailed my-pipeline1 opensearch2
opensearch.bulkRequestErrors my-pipeline1 opensearch
opensearch.bulkRequestErrors my-pipeline1 opensearch2
opensearch.documentsSuccess my-pipeline2 opensearch
opensearch.bulkRequestFailed my-pipeline2 opensearch
opensearch.bulkRequestErrors my-pipeline2 opensearch
dlvenable commented 6 days ago

We should also make changes to the metrics to differentiate between the plugin types. This may be done well through the use of a tag to indicate the type.