[Schema] Align OpenTelemetry metrics index templates with Data Pepper

KarstenSchnitter commented 1 month ago

Which domain protocol is relevant for this schema ?

The catalog describes a schema to be used with OpenTelemetry metrics data at https://github.com/opensearch-project/opensearch-catalog/tree/main/docs/schema/observability/metrics. Unfortunately, this schema is not compatible with the schema generated by Data Prepper. This can be explored using this example.

What is the schema resource ?

The Data Prepper schema for OpenTelemetry metrics follows closely the schema used for spans (and logs). All three issues allow for filters on resource attributes and instrumentation scopes to be applied to all signals. data-prepper#3929 introduces a mapping template for the metrics index. The same PR also contains mappings for traces and logs.

Source Schema - Add necessary repository

OTEL

Do you have any additional context?

To be added on request.

juergen-walter commented 1 month ago

Aligning the schema is a foundation to further invest into visualizations on top. What do you think @YANG-DB? Can you help to push this topic forward.

YANG-DB commented 1 month ago

@juergen-walter thanks for your review I'm not exactly sure what is the exact diff between the two ? is it only the index_type ?

KarstenSchnitter commented 1 month ago

I ran the example linked above, to extract a JSON sample. I ordered the fields alphabetically.

{
  "_index": "otel_metrics",
  "_id": "i7eJe5IBPqA3feadeJBE",
  "_score": 1,
  "_source": {
    "aggregationTemporality": "AGGREGATION_TEMPORALITY_CUMULATIVE",
    "description": "Total seconds each logical CPU spent on each mode.",
    "exemplars": [],
    "flags": 0,
    "instrumentationScope.name": "otelcol/hostmetricsreceiver/cpu",
    "instrumentationScope.version": "0.97.0"
    "isMonotonic": true,
    "kind": "SUM",
    "metric.attributes.cpu": "cpu0",
    "metric.attributes.state": "system",
    "name": "system.cpu.time",
    "resource.attributes.service@name": "otel-collector",
    "schemaUrl": "https://opentelemetry.io/schemas/1.9.0",
    "serviceName": "otel-collector",
    "startTime": "2024-10-11T12:22:24Z",
    "time": "2024-10-11T12:23:01.611880002Z",
    "unit": "s",
    "value": 0.28,
  }
}

Compared with the sum.json sample, there are the following differences:

metrics attributes are prefixed with metrics.attributes. by Data Prepper and not just by attributes.;
monotonicity is called isMonotonic by Data Prepper not just monotonic;
resource attributes are prefixed with resource.attributes. by Data Prepper and not just by resource.;
the current time is called time by Data Prepper not @timestamp;
the value is always created as a double value by Data Prepper without distinction into value.int or value.double. Due to the naming scheme, this causes a field type conflict, if Data Prepper was to write into the catalogue index.

I briefly checked the gauge and histogram example as well. There might be similar issues, if the data points get richer, e.g., by containing exemplars. I found in the http histogram samples, that they contain attributes without dedotted names (network.protocol.name). That will not happen with Data Prepper.

These differences should be resolved in a way, that leads to compatible index templates for all OpenTelemetry signals. This enables filtering by resource attributes or timestamps for different signal types in the same dashboard.

opensearch-project / opensearch-catalog

[Schema] Align OpenTelemetry metrics index templates with Data Pepper #197

Source Schema - Add necessary repository