jaegertracing / spark-dependencies

Spark job for dependency links
http://jaegertracing.io/
Apache License 2.0
125 stars 70 forks source link

Did not discover dependencies for Zipkin v1 and v2 spans posted to Jaeger #84

Open albert5190 opened 4 years ago

albert5190 commented 4 years ago

Requirement - what kind of business use case are you trying to solve?

Posting Zipkin v1 and v2 spans to Jaeger collector should result in sparc-dependency discovering the dependencs in the incoming spans.

Problem - what in Jaeger blocks you from solving the requirement?

Steps to reproduce the problem:

  1. Setup an environment with Jaeger collector, sparc-dependency, Jaeger query and Cassandra.
  2. Post Zipkin v1 json from https://zipkin.io/pages/data_model.html page to collector.
  3. Log into Cassandra, after sparc-dependency ran check the content of the dependency table.
  4. An entry of null data was added to the dependency table. ie. no dependency was discovered.

The same problem occurred for this Zipkin v2 json. zipkin-v2-bookinfo.txt

Note: Sending native Jaeger spans via Jaeger agent to the collect does not have this problem.

I could be wrong, but I suspect issue https://github.com/jaegertracing/jaeger/issues/2067 may have something to do with it as the sparc-dependency Java code relies on span.kind value to determine the client/server relationship.

Proposal - what do you suggest to solve the problem or improve the existing situation?

Any open questions to address

pavolloffay commented 4 years ago

Hi @albert5190, could you please paste here logs from spark-dependency job which didn't produce any results?

I could be wrong, but I suspect issue jaegertracing/jaeger#2067 may have something to do with it as the sparc-dependency Java code relies on span.kind value to determine the client/server relationship.

The referenced ticket does not show issues and the conversion to Jaeger model is correct. This repository contains e2e test with Zipkin data model https://github.com/jaegertracing/spark-dependencies/blob/master/jaeger-spark-dependencies-test/src/main/java/io/jaegertracing/spark/dependencies/test/DependenciesTest.java

The same problem occurred for this Zipkin v2 json. zipkin-v2-bookinfo.txt

Is it bookinfo from Istio project? What version?

apm-opentt commented 4 years ago

Hi @pavolloffay, Here is the spark-dependency job log for the Zipkin V1 json. It din't show anything interesting.

Starting spark job.
Cassandra contact points = ibmcloudappmgmt-cassandra:9042
+ '[' '!' -z '' ']'
+ echo 'Starting spark job.'
+ echo 'Cassandra contact points = ibmcloudappmgmt-cassandra:9042'
+ CASSANDRA_KEYSPACE=jaeger_v1_opentt
+ CASSANDRA_USERNAME=admin
+ CASSANDRA_PASSWORD=***
+ STORAGE=cassandra
+ CASSANDRA_CONTACT_POINTS=cloudappmgmt-cassandra:9042
+ java -jar /opt/jaeger-spark-dependencies.jar
20/03/02 14:22:15 INFO CassandraDependenciesJob: Running Dependencies job for 2020-03-02T00:00Z: 1583107200000000 ? Span.timestamp 1583193599999999
20/03/02 14:22:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/02 14:22:55 INFO CassandraDependenciesJob: Storing dependencies into dependencies
20/03/02 14:22:56 INFO CassandraDependenciesJob: Done, 0 dependency objects created
+ rc=0
+ echo 'Spark job completed 0.'
+ exit 0
Spark job completed 0.

The reference ticket shows the span.kind value being reversed after normalizing from Zipkin v1 to Jaeger native span format. ie. the caller being server and callee being client. In the spark-dependency Java code isServerSpan() and isClientSpan() relies on span.kind value caller being client and callee being server.

Yes, booking is from istio project. It is using Zipkin v2 data format. https://istio.io/docs/examples/bookinfo/ Here is the spark-dependency log for Zipkin v2 json post:

Starting spark job.
Cassandra contact points = ibmcloudappmgmt-cassandra:9042
+ '[' '!' -z '' ']'
+ echo 'Starting spark job.'
+ echo 'Cassandra contact points = ibmcloudappmgmt-cassandra:9042'
+ CASSANDRA_KEYSPACE=jaeger_v1_opentt
+ CASSANDRA_USERNAME=admin
+ CASSANDRA_PASSWORD=***
+ STORAGE=cassandra
+ CASSANDRA_CONTACT_POINTS=cloudappmgmt-cassandra:9042
+ java -jar /opt/jaeger-spark-dependencies.jar
20/03/02 14:42:15 INFO CassandraDependenciesJob: Running Dependencies job for 2020-03-02T00:00Z: 1583107200000000 ? Span.timestamp 1583193599999999
20/03/02 14:42:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/02 14:42:46 INFO CassandraDependenciesJob: Storing dependencies into dependencies
20/03/02 14:42:47 INFO CassandraDependenciesJob: Done, 0 dependency objects created
+ rc=0
+ echo 'Spark job completed 0.'
+ exit 0
Spark job completed 0.
pavolloffay commented 4 years ago

I don't think it's reserved in any case. See my comment in the thread.

apm-opentt commented 4 years ago

@pavolloffay I edited the previous comment with

The reference ticket shows the span.kind value being reversed after normalizing from Zipkin v1 to Jaeger native span format. ie. the caller being server and callee being client. In the spark-dependency Java code isServerSpan() and isClientSpan() relies on span.kind value caller being client and callee being server.

https://github.com/jaegertracing/spark-dependencies/blob/5828af5d7ec8804c67b2291a6d3db28b716316c7/jaeger-spark-dependencies-common/src/main/java/io/jaegertracing/spark/dependencies/SpansToDependencyLinks.java#L131