jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.54k stars 2.44k forks source link

Implement in-memory Service Dependency Graph using Apache Beam #5911

Open yurishkuro opened 2 months ago

yurishkuro commented 2 months ago

For background, see https://github.com/jaegertracing/jaeger/issues/5910

Jaeger all-in-one typically runs with in-memory or badger storage that both have a special implementation of Dependencies Storage API where instead of pre-computing and storing the dependencies they just brute-force re-calculate them on demand each time:

It's ok for small demos, but:

Following on the proposal from RFC #5910, we could re-implement this logic as an in-process streaming component using Apache Beam with direct executor. This will allow us to consolidate the graph building logic across memory and badger storages (in fact extract it from them into an independent component), and in the future we can find a way to adapt it to run in a distributed manner on big data runners without actually changing the business logic.

Some implementation details:

Steps:

NavinShrinivas commented 2 months ago

Hey yuri, this seems interesting. Are you thinking this will be a separate service that the collector forwards the details to? I'm just trying to make sense of this.

Are you in the process of splitting it down into smaller tasks?

yurishkuro commented 2 months ago

It's not a separate service.

tronda commented 2 months ago

The OpenTelemetry Collector Contrib includes the ServiceGraphConnector. This generates metrics based on the trace data which can be used to draw a dependency graph. Having struggled with Jaeger Spark dependency job, the service graph connector sounded appealing to us because the deployment would be much easier since we already have Prometheus available. Are there any architectural issues with using the service graph connector which doesn't fit Jaeger?

yurishkuro commented 2 months ago

@tronda Jaeger is not a metrics database, so in order to use the ServiceGraphConnector the user needs to run another backend. Then the transitive dependency graph is simply not representable in the metrics format, but is much more useful then the p2p graph that ServiceGraphConnector can produce.