opensearch-project / data-prepper

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
260 stars 195 forks source link

[FEATURE] generate service_maps from otel_metrics_source #5055

Open AdaptiveStep opened 2 days ago

AdaptiveStep commented 2 days ago

Is your feature request related to a problem? Please describe. The dataprepper processor "service_maps" only works with the source "otel_traces_source".

Describe the solution you'd like Make the "service_maps" processor work with "otel_metrics_source" as well.

Describe alternatives you've considered (Optional) First a little about the config i worked with: 1: "gRPC oteldata" is sent to an -> otel collector. 2: then the "open telemetry collector" -> sends data to -> dataprepper.

I've tried moving the service-graph data from metrics to traces in this "otel collector", but the open telemetry collector doesn't seem to be able to do this. It can only move service-graph data "from traces to metrics", but not reverse. Its then called a "Service graph exporter", and sends the service_graph via metrics. The application in question only sends metrics and therefore its not possible to get traces (and it shouldn't be necessary to get traces). Therefore I cannot get the servicemaps as it is now into opensearch.

For those experienced with OTEL know that It is considered somewhat of a standard that the service-graph to be visualized from metrics, (this is the reason why the otelcollector lets you "transport" service_graph data to metrics exporters). Tempo for instance assumes all service-graph data to be sent via the metrics channel.

I've also tried to just manually re-construct the "force diagram" (the default one found in the observability page) using only metrics data, but it took too much time figuring out how to do it with VEGA and opensearch.

As of now, I am forced to use alternative applications to get the desired service-graphs, such as Prometheus+Tempo+grafana. However, when using this solution I cannot use the anomaly detection in opensearch, since opensearch anomaly detection on works with indexes (not with external prometheus sources) (which sucks too!). Therfore, the servicemaps must be in opensearch indexes by the way opensearch works now! The only way to fix this as I see it, is for dataprepper to handle service_maps in metrics (as is done in industry standards). In industry strandard deployments, connecting services is usually done by the "open telemetry collector" itself. Therefore it might be better if we start expecting potential service_graphs to be in the metrics channel.

Additional context In other words: at the moment, dataprepper can only "connect" services if their "service map data" can be linked trough the traces. And only traces can be used to "connect the services"! This service-graph data cannot be "pre-configured" in the metrics object the way dataprepper works now! So, the processor "service_maps" only works for traces. It might be desirable to make this work with metrics as well.

Letting the OtelCollector connect the services, (instead of dataprepper doing it) might increase the performance and stability for dataprepper and the overall architecture performance.

Links: service_maps processor otel_metrics_source

Dataprepper can't create servicemaps from metrics alone, as of now, but Its probably possible to implent this. Here is an example of how it looks like when Tempo visualizes servicemaps from metrics alone, proving its possible.: tempo metrics servicegraphs

KarstenSchnitter commented 1 day ago

Data Prepper is currently constructing the service map from the span data. It determines which services are connected by following the spans within a trace. From this data, it constructs its own representation of the service map.

@AdaptiveStep Do I understand you correctly, that you want a processor, that recognises the metrics emitted by the OpenTelemetry Collector Service Graph Connector and transform them into the service graph used by the OpenSearch observability plugin?

AdaptiveStep commented 15 hours ago

@KarstenSchnitter That is a very good way of re-phrasing my request. I think you might be right.

The "serviceGraphConnector" looks like its doing the right thing. It seems to be creating desirable derived metrics form the traces, I am not sure.

Tempo expects a servicegraph generated by the "Metrics Generator" , which seems to me does the same thing as the ServiceGraphConnector (?). I am not sure if Tempo will accept the servicegraph generated by OTELS "ServiceGraphConnector" as I have not tested the "ServiceGraphConnector" yet, although it seems like it should.

Either way, once the properly generated metrics are sent to Tempo, it successfully generates the visualization without need for traces. It is common practice for some apps to send pre-generated servicemaps in their metrics, and often they fit the Tempo visualizer like this:

(1) traces_service_graph_request_total{client="app", server="db", connection_type="database"} 20

PS: By comparing the resulting servicegraphs from the "MetricsGenerator" with the "ServiceGraphConnector" it seems like they might doing the same thing. If so, then, how do we handle the data shaped like in point (1) ??