Open KarstenSchnitter opened 1 month ago
Aligning the schema is a foundation to further invest into visualizations on top. What do you think @YANG-DB? Can you help to push this topic forward.
@juergen-walter thanks for your review
I'm not exactly sure what is the exact diff between the two ? is it only the index_type
?
I ran the example linked above, to extract a JSON sample. I ordered the fields alphabetically.
{
"_index": "otel_metrics",
"_id": "i7eJe5IBPqA3feadeJBE",
"_score": 1,
"_source": {
"aggregationTemporality": "AGGREGATION_TEMPORALITY_CUMULATIVE",
"description": "Total seconds each logical CPU spent on each mode.",
"exemplars": [],
"flags": 0,
"instrumentationScope.name": "otelcol/hostmetricsreceiver/cpu",
"instrumentationScope.version": "0.97.0"
"isMonotonic": true,
"kind": "SUM",
"metric.attributes.cpu": "cpu0",
"metric.attributes.state": "system",
"name": "system.cpu.time",
"resource.attributes.service@name": "otel-collector",
"schemaUrl": "https://opentelemetry.io/schemas/1.9.0",
"serviceName": "otel-collector",
"startTime": "2024-10-11T12:22:24Z",
"time": "2024-10-11T12:23:01.611880002Z",
"unit": "s",
"value": 0.28,
}
}
Compared with the sum.json sample, there are the following differences:
metrics.attributes.
by Data Prepper and not just by attributes.
;isMonotonic
by Data Prepper not just monotonic
;resource.attributes.
by Data Prepper and not just by resource.
;time
by Data Prepper not @timestamp
;value
by Data Prepper without distinction into value.int
or value.double
. Due to the naming scheme, this causes a field type conflict, if Data Prepper was to write into the catalogue index.I briefly checked the gauge and histogram example as well. There might be similar issues, if the data points get richer, e.g., by containing exemplars. I found in the http histogram samples, that they contain attributes without dedotted names (network.protocol.name
). That will not happen with Data Prepper.
These differences should be resolved in a way, that leads to compatible index templates for all OpenTelemetry signals. This enables filtering by resource attributes or timestamps for different signal types in the same dashboard.
Which domain protocol is relevant for this schema ?
The catalog describes a schema to be used with OpenTelemetry metrics data at https://github.com/opensearch-project/opensearch-catalog/tree/main/docs/schema/observability/metrics. Unfortunately, this schema is not compatible with the schema generated by Data Prepper. This can be explored using this example.
What is the schema resource ?
The Data Prepper schema for OpenTelemetry metrics follows closely the schema used for spans (and logs). All three issues allow for filters on resource attributes and instrumentation scopes to be applied to all signals. data-prepper#3929 introduces a mapping template for the metrics index. The same PR also contains mappings for traces and logs.
Source Schema - Add necessary repository
Do you have any additional context?
To be added on request.