Allow trace group for partial traces

KarstenSchnitter commented 5 months ago

Is your feature request related to a problem? Please describe. DataPrepper does an aggregation of all spans with the same trace id. If encounters a span with a null parent span id, it assigns its name as trace group to all spans. This allows classification in the OpenSearch Dashboards observability plugin. However, this approach fails, if the global parent span does not arrive in time or at all. Consider the following situation:

otel-partial-trace drawio

In this picture, DataPrepper receives all coloured spans. It does not receive the gray spans. This might be because they are created in another system outside of the reach of the observability infrastructure the DataPrepper instance belongs to. This can be another vendor or a client system where the coloured spans are generated within a SaaS solution.

Currently, DataPrepper will not create a trace group entry for the spans, since the global trace parent is never received.

Describe the solution you'd like It would be great, if in that case DataPrepper would follow the connection along the parent span ids until it can no longer resolve the parent. If this leads to a unique span, this span should be used as the trace parent instead of the original global trace parent.

The picture shows a conflict situation, where no unique parent can be determined. In that case, no trace group should be issued, keeping the current behaviour.

Additional context For the implementation, this feature could be an option in the OTelTraceRawProcessor where the detection of a parent span needs to be changed.

Alternatively, in the OTelTraceGroupProcessor the search query could be changed.

It would also be possible to create a new processor or action in the aggregate processor, that fills in empty trace groups if possible.

feldentm-SAP commented 2 weeks ago

We have a service that runs as a backing service for other services. As such, we are, fundamentally, confronted with this situation if the service above uses tracing as we would not share a common trace backend.

From our perspective, it does not really matter which headless span is picked as root span or if an artificial root span is introduced. However, all spans should be accessible, visible and the reported data should be somewhat plausible.

Furthermore, I want to stress that whatever algorithm you come up with should be able to work with essentially arbitrary structures. The rationale is that for somewhat detailed tracing the collision rate for random numbers isn't simply 1/2**32 (see https://en.wikipedia.org/wiki/Birthday_problem). For a service with sufficient throughput and usage, this would be observable eventually.

KarstenSchnitter commented 2 weeks ago

@feldentm-SAP: If I understand you correctly, you are suggesting for the picture above, where there is one trace part below the local trace parent and another part under the conflicting trace parent reaching your service, that Data Prepper should create two trace groups. One based on the local trace parent and the other based on the conflicting trace parent. Is this what you had in mind?

In this scenario, it could happen, that the same trace group is created for both parts (if the two spans have the same name). It needs to be evaluated first, how the Opensearch observability plugin visualizes this scenario. This will aid the development of an algorithm for the problem. I can try to create such a scenario.

feldentm-SAP commented 2 weeks ago

@KarstenSchnitter I have no deeper insights into your terminology. I'll try to make an example that can be turned into a test.

Let us assume that we have a somewhat broken application sitting above us that makes a request that takes 3min and retries after 1min. The result is two partially overlapping identical requests that share TraceID but have different server SpanIDs.

An admittedly harder case is batch processing at the end of the day where two work packages can serve the same TraceID that produced a synchronous server span/SpanID hours earlier. We even have "do in next upgrade window" use cases, but I would accept that this is a really hard use case for trace backend, especially, since the distance could be more than the retention time.

As a user of a trace backend/UI, I do not care how this is really presented, as long as the complete information is shown and presented in a somewhat plausible manner. The situation isn't an error. It is the nature of connecting complex services that each support distributed tracing.

From my perspective, I'd use union-find merging because that also avoids the cycles issue with reasonable consequences. Can't tell how much architecture changes are caused by having multiple trace groups. If this isn't affordable, I'd either use random or heaviest group if an artificial ghost parent is also not viable.

KarstenSchnitter commented 2 weeks ago

Having long running traces is a problem for most OpenTelemetry backends. Data Prepper offers a configurable timeout trace_flush_interval in the Trace Raw Processor, that defines, how long it will wait for a root span (default: 180s). From the root span data is collected into the trace group: the overall duration, the end time, and the status code. If the root span does not arrive in the flush interval, the trace group will be empty. This will lead to degraded presentation in the OpenSearch Dashboards observability plugin.

If the root span is sent to another OpenSearch instance, Data Prepper can fetch the root span using the OTel Trace Group Processor. @feldentm-SAP , this could help in your scenario if

your customers were collecting the spans in OpenSearch and
your customers provided access to their OpenSearch cluster for the Trace Group plugin.

Still, all of this only works, if the root span arrives within the trace_flush_interval. By design, the root span will always arrive last, since it encompasses all its children. There are exceptions, e g., network delays. Increasing the flush interval might improve the chance to catch the root span, but it will cost more memory to buffer the spans longer and also prolong the time, when spans become available in OpenSearch.

To accommodate late root spans, one could run queries at scheduled intervals, that try to repair missing trace groups with root spans, that have finally been indexed. But this is another issue.

The problem that I and my colleague are describing is when the root span never reaches Data Prepper or any reachable system. In that case, Data Prepper should provide an option to improve on the missing trace groups for better visualization.

opensearch-project / data-prepper

Allow trace group for partial traces #4517