Visualizing follows_from references with horizontal stacking

richard-fine commented 5 years ago

Requirement - what kind of business use case are you trying to solve?

It is sometimes necessary to trace processes which involve a long sequence of sequential spans (perhaps using the follows_from reference). For example, imagine a trace of a system which is given a DAG of thousands of short-lived work items, and spawns N threads, each of which begins executing work items in dependency order. In terms of tracing what actually happened, we would like to record a span for each individual work item, and we would like to be able to visualize the parallelism across the threads, so we can identify sync points in the DAG.

Problem - what in Jaeger blocks you from solving the requirement?

Jaeger-UI does not currently support visualizing long sequences of spaces (e.g using the follows_from ref) in an efficient manner. Every row in the chart always only displays a single span, meaning that a long sequence of short spans shows as a large amount of wasted space:

This is pretty low information density. It makes it harder to see that the spans are sequential, and makes it harder to spot when threads are work-starved due to a sync point.

Proposal - what do you suggest to solve the problem or improve the existing situation?

Ideally, find some way to horizontally stack spans in a disjoint follows_from sequence. For example, here is a trace of a similar process in the Chrome tracing UI:

This is much more compact, and makes it very easy to see how successfully the process is parallelized. Detail about an individual span can be seen by clicking it, in a similar way to Jaeger.

Any open questions to address

What would happen to the duration markers, currently displayed to the right of a span, when they would overlap a horizontally stacked neighboring span? (Maybe it's OK to just drop durations in that context - you can see it in the detail view when you click on the span anyway).
What happens when there is more than one operation name in the sequence? Right now the operation name is shown once for the whole row.
What happens when the trace contains spans that are child_of the spans in the sequence?

yurishkuro commented 5 years ago

Ironically, one of the oldest, still unresolved, issues in OpenTracing, https://github.com/opentracing/opentracing-go/issues/3, is about how to model sibling spans. Your use case is a lot better / clearer scenario, assuming your individual work units make other downstream calls (otherwise it's all in process and distributed tracing may not be the best tool).

I've seen some commercial APM vendors having a different type of display, resembling more a flame graph than a Gantt chart, where non-overlapping children of the same span are displayed side by side, with their own children below. They don't even have to be produced by the same service. But I can't see how that view applies to a general trace shape.

One thing you can do now is to create next span on the thread as a child of the previous one, this would at least separate the threads visually, but information density would still be low (long staircases).

Ultimately, we're missing a reference type that would mark a sibling, something that starts after the previous span finishes. Follows-from does not imply that relationship, it only implies parend.start happened-before child.start, no info about parent.end.

richard-fine commented 5 years ago

Your use case is a lot better / clearer scenario, assuming your individual work units make other downstream calls (otherwise it's all in process and distributed tracing may not be the best tool).

Yes, or that the whole process is downstream of some other services (which happens to be true in my concrete use case - I'm trying to trace build/test farm activity, so I am tracing a number of microservices around triggers, work scheduling, VM/container orchestration, etc, all the way down to the build process itself. The build process itself is what I'm really describing in this ticket). It's conceivable that future farm workloads will then make further downstream calls.

One thing you can do now is to create next span on the thread as a child of the previous one, this would at least separate the threads visually, but information density would still be low (long staircases).

Right. I tried something similar by creating a span for each worker thread, and then making the individual build tasks be children of their thread's span. As you say, it grouped all the work on a thread together, but it was still one long staircase per thread.

Ultimately, we're missing a reference type that would mark a sibling, something that starts after the previous span finishes. Follows-from does not imply that relationship, it only implies parend.start happened-before child.start, no info about parent.end.

Ah, interesting. Yeah the OpenTracing spec is quite vague about this relationship - though it does say that "the child Span FollowsFrom the parent Span in a causal sense", which is maybe not appropriate in this case, as a thread's work item N+1 is not really 'caused' by work item N.

I was wondering whether we could just determine that multiple child_of the same span, with disjoint start/end times, could be stacked... but reading the issue you linked, I see that doesn't work when we have things like clock skew. So it does indeed feel like we need a new relationship in here. I'll join the discussion on the other ticket and see if I can help out there.

yurishkuro commented 4 years ago

Related to flame graph view #525

Rperry2174 commented 2 years ago

Hi @yurishkuro we use Jaeger for some of our examples and we ship our flamegraph component as a standalone npm package in a way where it could be used to visualize anything structured the right way (including traces).

Here's a proof-of-concept of what it looks like to visualize a Jaeger trace as a flamegraph:

https://user-images.githubusercontent.com/23323466/179668011-b2091db3-3d22-49f3-9f31-a457e61b9275.mp4

We currently just use this in our fork of JaegerUI where it's only ~30 lines to add this feature, but we'd be happy to contribute upstream if this is something you're interested in.

If you'd like to play with it yourself you can run Pyroscope's jaeger example and then visit the jaeger UI on port :4000.

We use our own version of the hotrod demo with its structure explained here.

Also would love to get feedback or more test cases to try!

yurishkuro commented 2 years ago

@Rperry2174 I think it would be great to have the viz built-in natively in Jaeger. Once the core is in, we can fiddle with the exact viz based on users feedback.

yurishkuro commented 2 years ago

By "built-in natively" I mean integrated with - it's fine to use your npm module for that.

jaegertracing / jaeger-ui