opensearch-project / opensearch-catalog

The OpenSearch Catalog is designed to make it easier for developers and community to contribute, search and install artifacts like plugins, visualization dashboards, ingestion to visualization content packs (data pipeline configurations, normalization, ingestion, dashboards).
Apache License 2.0
21 stars 19 forks source link

[Schema] Align OpenTelemetry metrics index templates with Data Pepper #197

Open KarstenSchnitter opened 1 month ago

KarstenSchnitter commented 1 month ago

Which domain protocol is relevant for this schema ?

The catalog describes a schema to be used with OpenTelemetry metrics data at https://github.com/opensearch-project/opensearch-catalog/tree/main/docs/schema/observability/metrics. Unfortunately, this schema is not compatible with the schema generated by Data Prepper. This can be explored using this example.

What is the schema resource ?

The Data Prepper schema for OpenTelemetry metrics follows closely the schema used for spans (and logs). All three issues allow for filters on resource attributes and instrumentation scopes to be applied to all signals. data-prepper#3929 introduces a mapping template for the metrics index. The same PR also contains mappings for traces and logs.

Source Schema - Add necessary repository


Do you have any additional context?

To be added on request.

juergen-walter commented 1 month ago

Aligning the schema is a foundation to further invest into visualizations on top. What do you think @YANG-DB? Can you help to push this topic forward.

YANG-DB commented 1 month ago

@juergen-walter thanks for your review I'm not exactly sure what is the exact diff between the two ? is it only the index_type ?

KarstenSchnitter commented 1 month ago

I ran the example linked above, to extract a JSON sample. I ordered the fields alphabetically.

{
  "_index": "otel_metrics",
  "_id": "i7eJe5IBPqA3feadeJBE",
  "_score": 1,
  "_source": {
    "aggregationTemporality": "AGGREGATION_TEMPORALITY_CUMULATIVE",
    "description": "Total seconds each logical CPU spent on each mode.",
    "exemplars": [],
    "flags": 0,
    "instrumentationScope.name": "otelcol/hostmetricsreceiver/cpu",
    "instrumentationScope.version": "0.97.0"
    "isMonotonic": true,
    "kind": "SUM",
    "metric.attributes.cpu": "cpu0",
    "metric.attributes.state": "system",
    "name": "system.cpu.time",
    "resource.attributes.service@name": "otel-collector",
    "schemaUrl": "https://opentelemetry.io/schemas/1.9.0",
    "serviceName": "otel-collector",
    "startTime": "2024-10-11T12:22:24Z",
    "time": "2024-10-11T12:23:01.611880002Z",
    "unit": "s",
    "value": 0.28,
  }
}

Compared with the sum.json sample, there are the following differences:

I briefly checked the gauge and histogram example as well. There might be similar issues, if the data points get richer, e.g., by containing exemplars. I found in the http histogram samples, that they contain attributes without dedotted names (network.protocol.name). That will not happen with Data Prepper.

These differences should be resolved in a way, that leads to compatible index templates for all OpenTelemetry signals. This enables filtering by resource attributes or timestamps for different signal types in the same dashboard.