Open hamiltont opened 3 years ago
Span represents a unit of work in a single process. You need to use two spans if they are in different processes, and link them as parent-child.
I've got parent/child linkage working, what I'm hoping to also have is a top-level "span" (or span-like?) that encompasses the entire flow, so I can easily see the end-to-end latency as well as get a nice graphical breakdown of which stages took the most processing time.
My goal is to marry distributed tracing with a data pipeline for visibility. There is not one "top-level" process where the pipeline both starts and ends. Data comes into an initial node and flows through a processing pipeline, taking different routes depending on what is contained therein. My current "parent" is the span that first received the data, but the end result is a graphic that looks like:
|------|
|--------|
|------|
e.g. there is no all-encompassing span. Perhaps this is a visualization issue - I'm using DataDog, would be interesting to know if other tools automatically provide some 'wrapper' span in scenarios like this that can then be used for statistics.
If there's no concept for this then that's OK, was just hoping I have missed something. Seemed to me the closest thing to ask for was ability to terminate a span on a remote system, so I could do this:
|---------------------------------|
|------|
|--------|
|------|
I've got a data processing pipeline with spans setup for each task, and I'm properly injecting/extracting context as execution moves from node to node to run different task types. Unfortunately I'm struggling to get one span across the entire "transaction." Seems the issue is I'm starting my desired top-level parent span on one node, but I need to finish it span on a different node. Is there an API for this?
I have been able to setup a timer on the origin node to query the DB repeatedly to check if the overall "transaction" is done, and then finish the top-level parent span. This half works, although 1) it loads the database 2) total span time is always a multiple of how rapidly I query the DB, which means point (1) gets really bad if I was a decent visibility into total e2e transaction latency at a high-ish transaction volume.
Any tips you could offer?