openzipkin / zipkin-support

repository for support questions raised as issues
4 stars 2 forks source link

Lens UI: Not displaying inner CLIENT spans correctly #19

Closed dstepanov closed 4 years ago

dstepanov commented 4 years ago

Version: 2.19.2

Attached trace is displayed as:

+ my-service (with dropdown button)
   + my-service (with dropdown button)
      - remote-service

The dropdown button always drops the main root. In the discovery panel, it's displayed as 3x my-service and 2x remote-service

Expected:

[
   {
      "traceId":"73ace58cca1e3f85",
      "parentId":"73ace58cca1e3f85",
      "id":"036f42fcb6661c79",
      "kind":"CLIENT",
      "name":"get",
      "timestamp":1574947566084564,
      "duration":9126,
      "localEndpoint":{
         "serviceName":"my-service"
      },
      "remoteEndpoint":{
         "serviceName":"remote-service"
      }
   },
   {
      "traceId":"73ace58cca1e3f85",
      "parentId":"036f42fcb6661c79",
      "id":"2059bd44ae854a73",
      "kind":"CLIENT",
      "name":"get",
      "timestamp":1574947566095573,
      "duration":19880,
      "localEndpoint":{
         "serviceName":"my-service"
      },
      "remoteEndpoint":{
         "serviceName":"remote-service"
      }
   },
   {
      "traceId":"73ace58cca1e3f85",
      "id":"73ace58cca1e3f85",
      "kind":"SERVER",
      "name":"get",
      "timestamp":1574947566012997,
      "duration":103426,
      "localEndpoint":{
         "serviceName":"my-service"
      },
      "remoteEndpoint":{
         "ipv4":"127.0.0.1"
      }
   }
]
jcchavezs commented 4 years ago

Hi @dstepanov I am curious about this trace. As fas as I see, the three of them have the same localEndpoint.serviceName="my-service" and two of them remoteEndpoint.serviceName="remote-service". That would explain the 3x my-service and 2x remote-service

On the other hand, the span 036f42fcb6661c79 is a CLIENT and its child 2059bd44ae854a73 is also a CLIENT which does not seem to be correct.

I think this is not a bug but a misleading trace.

dstepanov commented 4 years ago

Well, that happens when you have reactive clients and operations that are sequential: do one operating and based on the result do another. In non reactive world you would have blocking operations and multiple spans with the server parent span.

dstepanov commented 4 years ago

Regarding spans counting, yes it makes sense but in the real scenario there is one main call and two client calls which are projected to spans.

anuraaga commented 4 years ago

@dstepanov What instrumentation are you using to generate traces? We've discussed tracing asynchronous requests at a meetup before and we felt that the trace should look similar to a blocking server - a server request with two client span siblings (so they are both leaves). There isn't a parent child dependency between the two client spans regardless of whether an event loop is running the requests (or many requests, possibly in parallel) or a normal blocking thread, they're two requests that happen in some order. So I don't think 2059bd44ae854a73 is supposed to have it's parent set to 036f42fcb6661c79. It would be good to know what instrumentation you're using to understand whether this relationship was even intended, and if so what the reasoning is.

anuraaga commented 4 years ago

For reference, in Armeria, an asynchronous framework which uses Brave for instrumentation, we expect users to ensure callbacks are run on an executor that remounts the server's context (meaning server's span) to ensure client requests are children of it.

https://github.com/line/armeria/blob/master/brave/src/test/java/com/linecorp/armeria/it/brave/BraveIntegrationTest.java#L183

In RxJava, this usually means something like .observeOn(Schedulers.from(ServiceRequestContext.current().contextAwareExecutor()). For reactive applications, it's currently generally difficult to get correct traces without explicitly specifying executors / context propagation.

dstepanov commented 4 years ago

@anuraaga I'm using Brave 5.9 some fo the calls are wrapped with:

try (Tracer.SpanInScope ignored = tracer.withSpanInScope(span)) {
}

And CurrentTraceContext::wrap for extra ExecutorService and ThreadFactory

If what are you saying is correct then Brave should put all CLIENT spans to the SERVER root span and not allow this kind of trace.

codefromthecrypt commented 4 years ago

brave is an instrumentation library. the key here is the word reactive.. that's a programming model that may be closer to where the problem lies, and likely where the fix will go.

a client of a client is a bug, and I'm certain this has been mentioned many times. if a client is called on the callback of a client, that response callback should be the original parent. the original parent should be the server/consumer or an intermediate response span.

This particular issue isn't actionable as the model is incorrect. To fix the modeling problem will require being much more specific about what the instrumentation is, as brave is too vague. Possibly it is some reactive framework and that's where the solution will end up.