openzipkin / zipkin-support

repository for support questions raised as issues
4 stars 2 forks source link

gRPC context not propaged #40

Closed asarkar closed 4 years ago

asarkar commented 4 years ago
Copy of https://github.com/spring-cloud/spring-cloud-sleuth/issues/1749, since I'm told the issue is in brave, and not in slueth.

I've this application with a gRPC server (A) that calls a gRPC client (B) that calls an external gRPC service. It all starts with a ServerInterceptor injecting a user id in the Context that is required by the service A; since A doesn't have access to the headers, context is the only way to propagate metadata from the client. The ServerInterceptor ends with:

return Contexts.interceptCall(ctx.withValue(AppGlobal.USER_ID_CONTEXT_KEY, userId), call, headers, next)

It is ordered LOWEST_PREFERENCE such that TracingServerInterceptor runs first, and it does. I can see the brave gRPC headers while in the ServerInterceptor. However, once in the service A, a log statement doesn't print any trace or span, and obviously neither does any downstream code. The log is correctly configured, and logs trace and span in the ServerInterceptor.

I've read Sleuth and Brave's documentation on gRPC instrumentation, but didn't see anything that discusses modification of context in a ServerInterceptor. I also didn't find any documentation on manually propagating Sleuth context for gRPC.

It's not clear where Sleuth looks for trace and span ids. If in the headers, then those are not modified and the trace should be visible in service A, but it isn't.

FWIW, the ServerInterceptor and service A are running on different threads. I didn't configure them to do so, so must be the gRPC library.

Using grpc-kotlin v0.1.5, Spring Cloud Hoxton.SR6 and Brave 5.12.4.

codefromthecrypt commented 4 years ago

Armeria integrates its context into Brave's via this: https://github.com/line/armeria/blob/master/brave/src/main/java/com/linecorp/armeria/common/brave/RequestContextCurrentTraceContext.java This works regardless of gRPC or not.

Brave's implementation of grpc does not use grpc context to propagate the trace context into listener callbacks. It uses explicit propagation and the CurrentTraceContext api. This means that your callbacks that are managed elsewhere should be instrumented with CurrentTraceContext somehow. It may be tempting to say "let grpc be the underlying context", but doing that would also effect other things such as reactor context.

My guess is that your context is breaking as a side effect of kotlin coroutine. If so, check out https://github.com/openzipkin/brave/issues/820#issuecomment-447614394 and related comments as a work around.

codefromthecrypt commented 4 years ago

One other thing to keep in mind is that the propagation component (BaggagePropagation) integrates with headers (grpc metadata). In other words headers can be passed through using this.

Context integration is currently limited to write down. Meaning that while you can push propagated fields such as trace IDs using brave into MDC, it does not aim to replace Log4J2 which has a means to look at multiple sources for propagated fields. In other words, it does not pull data in from any context such as grpc, reactor or otherwise.

asarkar commented 4 years ago

@adriancole I reviewed https://github.com/openzipkin/brave/issues/820, but it's a 2-year old ticket, and various separate concerns are mixed in. There are some discussions about a Coroutine context element TracingContextElement, and someone claims to have gotten it working. However, when I looked in their code, I couldn't find TracingContextElement referenced anywhere. There's also no mention of gRPC, other than someone linking to kroto-plus.

I gathered that launching coroutines by adding the TracingContextElement is what people seem to be doing. e.g. GlobalScope.async(TracingContextElement(Tracing.current().currentTraceContext())). However, this only applies to user code, and not code in the third-party libraries used by the user code.

Is there a clear outcome for that ticket as to what the problem is, and what, if anything, can be done about that?

codefromthecrypt commented 4 years ago

Your topic is mixing quite a few things: Kotlin and also gRPC (both of which have different thread execution models). Also, using grpc.context.

I mentioned kotlin not as a solution, but as one part of your composite that you could look at.

Regardless, my last comment is something you can re-read. Brave does not read properties from gRPC context, so it will not synchronize them. Things like user ID are often propagated in headers (Metadata), and we do have tests literally for user_id that propagate this way.

https://github.com/openzipkin/zipkin-support/issues/40#issuecomment-707482694

In other words, if you are expecting brave to carry around grpc's context into MDC, that won't work. MDC sync only works for properties managed with BaggagePropagation apis or that uses it (ex sleuth's baggage properties).

codefromthecrypt commented 4 years ago

if you want to follow-up https://gitter.im/openzipkin/zipkin

regardless, I will ping back here with kotlin and grpc examples. However, they will not result in brave pulling data from grpc.Context into its BaggagePropagation component (and therefore managing MDC). It will only do normal Headers/Metadata automatic extraction.

pulling data from grpc.Context or otherwise synchronizing it would require a lot of work and more demand for it, most of all hands to help with it. There are no paid staff working here. I'm personally choosing to respond instead of eating breakfast. Generally, we don't implement something the first time it is asked for.

asarkar commented 4 years ago

Hi @adriancole

Of course, this is OSS, and you're not getting paid to maintain it, so I, and I'm sure other users of this lib, appreciate that. Your comment came across as if I'm demanding a fix, whereas I'm not, because there's no ground for doing that. However, I do want to understand the problem, and if it so happens, contribute to fixing it. I'm discussing a solution with OpenTelemetry folks too, so as not to put all my eggs in one basket.

asarkar commented 4 years ago

I didn't understand why was this ticket closed. Whether this is doable in the short term or not, this is a legit problem with the library and hence, a valid ticket.

codefromthecrypt commented 4 years ago

It is closed because we don't use tickets for support. What we do when people ignore our issue templates is move the ticket here first before closing.

For now, I have explained to you what does and doesn't work. you can disagree with my explanation and I mentioned you can follow up on gitter.