Bring streaming analytics support directly into Jaeger backend, instead of requiring separate Spark/Flink data pipelines.
Background
One of the challenges of distributed tracing is that spans can arrive from all kinds of places in the architecture at different times. If your only job is to store them (which is what Jaeger collector does primarily) then it's not a big problem, since the storage backends take care of partitioning and indexing the spans by trace-id. But the most interesting applications of traces require looking at a whole trace in one place to make decisions based on the overall call graph, not on individual spans.
Data Streaming is great at doing that. Historically Jaeger supported a couple of Java-based data pipelines (for basic dependency graph and for transitive dependency graph), which were implemented independently on top of Spark and Flink frameworks. There were problems with that approach:
The business logic had to be written in Java, meaning we could not reuse all the domain model capabilities we had in the primary Go code
We had to duplicate some of the logic, e.g. the all-in-one supported constructing a dependency graph on the fly, which was implemented completely independently from the Java Spark job.
We should bring streaming capabilities into the main Jaeger repo using Go code. This will address many of the problems mentioned above. The main challenge with data streaming is that it is a stateful activity, which requires checkpointing capabilities to avoid data loss and inconsistent results when Jaeger instances are restarted. This is where the well known streaming frameworks like Spark and Flink come in - they provide the needed orchestration and statefulness. In the past we could not use them with Go, but today there are projects like Apache Beam that provide a unified programming model via well supported SDK (including Go) that allows implementing the pipeline logic in Go and executing it on a number of runtimes
Summary
Bring streaming analytics support directly into Jaeger backend, instead of requiring separate Spark/Flink data pipelines.
Background
One of the challenges of distributed tracing is that spans can arrive from all kinds of places in the architecture at different times. If your only job is to store them (which is what Jaeger collector does primarily) then it's not a big problem, since the storage backends take care of partitioning and indexing the spans by trace-id. But the most interesting applications of traces require looking at a whole trace in one place to make decisions based on the overall call graph, not on individual spans.
Data Streaming is great at doing that. Historically Jaeger supported a couple of Java-based data pipelines (for basic dependency graph and for transitive dependency graph), which were implemented independently on top of Spark and Flink frameworks. There were problems with that approach:
all-in-one
supported constructing a dependency graph on the fly, which was implemented completely independently from the Java Spark job.Proposal
We should bring streaming capabilities into the main Jaeger repo using Go code. This will address many of the problems mentioned above. The main challenge with data streaming is that it is a stateful activity, which requires checkpointing capabilities to avoid data loss and inconsistent results when Jaeger instances are restarted. This is where the well known streaming frameworks like Spark and Flink come in - they provide the needed orchestration and statefulness. In the past we could not use them with Go, but today there are projects like Apache Beam that provide a unified programming model via well supported SDK (including Go) that allows implementing the pipeline logic in Go and executing it on a number of runtimes