Standard(s) for in-process propagation

yurishkuro commented 7 years ago

OpenTracing 1.x leaves the question of in-process context propagation entirely in the hands of the applications themselves, which substantially reduces the usefulness of the standard since different frameworks instrumented with OT do not have a way to exchange context.

Standards for in-process context propagation inevitably have to be specific to individual languages. This is more of a placeholder issue. It is also somewhat similar to #22 (possible part of it), but can be done independently.

wu-sheng commented 7 years ago

In-process propagation, and support different frameworks instrumented, seems very hard, as I known. And different platforms have different characteristics.

In Java, same Context in ThreadLocal, and provide inter-thread propagation spec, maybe a effective way.

Keep me in the loop. :)

codefromthecrypt commented 7 years ago

Last july @Xylus and @eirslett did a couple day spike on context. Assets from those meetings are here; https://drive.google.com/drive/u/1/folders/0B0tSnQT3uGdAfndwS3RKVnMtM0ZKMWU4R1dicTNEXy1NSlBJaTlJQ05abGVneTlucW5HdFk

Eirik ended up spiking an effort towards that later, https://github.com/DistributedTracing/continuation-local-storage-jvm/pull/1

Ironically, last december, iirc, distributed context propagation was the first name of the OpenTracing group, so not too surprising it is here. Since then, I think uber had some work towards it naming the group https://github.com/openctx

More recently, recently google are changing their own context to decouple storage from it. https://github.com/grpc/grpc-java/pull/2461 Also, other tools like Log4J2 have added hooks to interop with other's context storage engines.

As mentioned in another issue, there's some layering concern, for example if tracing is the right place to define context (as they are often multi-purpose and below tracing). That said, whatever hooks there are tracing is at least a primary consumer, and interesting to see a tracking issue arise.

yurishkuro commented 7 years ago

@adriancole what I find peculiar is that several similar concepts of context storage/provider I've seen (including Jaeger's) pretend that they are just an interface, but in fact they cannot work with anything but a thread-local based implementation. For example, TChannel (which is a form of RPC framework) can be configured with a TracingContext, as a way of abstracting the actual mechanism of in-process propagation, but the mere fact that a context object is not passed to RPC methods but instead extracted via that abstraction essentially means the abstraction must be implemented via thread-locals.

NB: the TracingContext in Jaeger and TChannel is to a Span what gRPC Storage is to a Context. I.e. the same separation of how something is propagated vs. what is being propagated. They are just smaller in scope and only propagate a single Span.

codefromthecrypt commented 7 years ago

@adriancole what I find peculiar is that several similar concepts of context storage/provider I've seen (including Jaeger's) pretend that they are just an interface, but in fact they cannot work with anything but a thread-local based implementation.

interesting point for JVM tech. yeah different types of thread locals, wrappers of thread locals, or thread-local references to something else.. all have thread locals in there somewhere. Maybe there's some subtle (or native) option I'm also unaware of. @raphw @tmontgomery you know magic ( or at least where bodies are often buried ) .. do you know something available on JVM that can be used to store incidental trace state which isn't directly or indirectly a thread local?

raphw commented 7 years ago

As a matter of fact, I have written a small thread-local-ish utility for this exact purpose. The problem with thread locals is that it only allows access from within a Thread. It is not possible to set an object from without a Thread even if the latter is known.

Also, I have started migrating away from the above DetachedThreadLocal and rather use the WeakConcurrentMap. The latter allows to define a form of key object that represents a state. This state is automatically garbage collected once the key is eligible for collection. Propagation works by sharing such keys where the context is removed once a context has become obsolete.

All in all, the problem always boils down to managing a form of global-state what is always tedious.

wu-sheng commented 7 years ago

@adriancole what is incidental trace state? Why should be stored in something like threadlocal?

wu-sheng commented 7 years ago

To all, threadlocal can't solve all problems, somthing like async process model will bring us more.

For example, Disruptor is a very useful in-memory async lib. If application use this, all context base on threadlocal will fail without doubt.

This is very important for my exp.

codefromthecrypt commented 7 years ago

@adriancole https://github.com/adriancole what is incidental trace state? Why should be stored in something like threadlocal?

sorry poor choice of words. Most users aren't aware of trace state as it is implicitly passed around on their behalf (of course some are). The usual implicit mechanism in Java is a thread local, though it is problematic in async libraries. For example, usually you have to manually re-attach the state when some accident of thread scheduling loses it. Ex the below from Brave + RxJava

public Action0 onSchedule(final Action0 action) { final ServerSpanThreadBinder binder = brave.serverSpanThreadBinder(); final ServerSpan span = binder.getCurrentServerSpan(); return new Action0() { @Override public void call() { binder.setCurrentSpan(span); action.call(); } }; } https://github.com/smoketurner/dropwizard-zipkin/blob/master/zipkin-rx/src/main/java/com/smoketurner/dropwizard/zipkin/rx/BraveRxJavaSchedulersHook.java

One of the reasons I asked about alternatives to thread locals was to figure out what current practices exist, as it is helpful to scope what might end up in a spec or library here or elsewhere.

make sense?

wu-sheng commented 7 years ago

@adriancole, yes, that is my concern.

wu-sheng commented 7 years ago

But I think only change threadlocal to something similar is not explicit enough to someone, who is not a tracer developer.

Provide some APIs, like carrier, inject, extract for rpc, are better choice for me. They will be easy to understand.

cwe1ss commented 7 years ago

In terms of the specification, the more interesting question for me is the contract of this API:

Should it be possible to store just one span in the context?
- This would probably be e.g. the RPC server span and all lower-level components would be child spans of this one span without any nesting.
Should it be implemented as a stack?
- How do you know on POP that you got "your" span? (in case a child component failed to close/pop its span or called pop too often)
- How does a component know that it is the outermost component and that it should close any open spans. Example: The RPC server creates a "root span" and adds it to the stack, a lower-level component (e.g. a DB access library) opens a child span but the code failed with an exception. The request exception handler now needs to know that the DB span must be closed.
Should it be implemented as a list or a tree to allow for more reference types?

It might be easier to decide which language-specific storage construct might be the right one, when we know which features we want it to support.

sjoerdtalsma commented 7 years ago

I am creating a library called 'context-propagation' meant purely to propagate any (threadlocal) context implementation from a parent-thread to any background thread in a stack-like fashion, using the Autocloseable try-with-resources mechanism to allow verified stack unrolling (and automatic skipping if a pop is missed). It revolves around a Context and ContextManager interface (each with only two methods) and a utility to create a snapshot from all known ContextManager implementations at once. If you're interested, please take a look here and let me know what you think: https://github.com/talsma-ict/context-propagation

(we needed it for security propagation, but are now looking into adding opentracing and would like to re-use the mechanism that is currently in-place that serves us well).

yurishkuro commented 7 years ago

@cwe1ss I think the only requirement is "I need to get the current span". The fact that Java implementation often uses stack is a side-effect of thread-local implementation, since you're using the same API to retrieve different spans at different times, and stack is the only thing that makes sense for that.

cwe1ss commented 7 years ago

@yurishkuro there's also the logical requirement to "set the current span". Do you think this is implicit - meaning that every span that is created is automatically "the current span" or do you see this as an explicit feature/step?

In other words, do you think that the context should be part of the tracer interface in which a call to StartSpan would automatically store the span in the context and something new like tracer.CurrentSpan would return it from the context? Or would you prefer a new interface like traceContext that must be called explicitly by the user after a span has been created - e.g. traceContext.SetCurrentSpan and traceContext.GetCurrentSpan?

The former would make for a nice user-friendly API and the latter seems to be closer to what we have now in jaeger-context and my C# contrib library. But both have advantages and disadvantages that need further discussions of course.

yurishkuro commented 7 years ago

Sorry, you're right, it needs get and set fort current span.

bhs commented 7 years ago

@sjoerdtalsma thanks for sharing the context propagation lib... nice to see something that's already in use and not pre-coupled to a distributed tracing system per se.

@yurishkuro @cwe1ss I am prototyping a few approaches to make this pluggable... I would like OT to support in-process context managers without getting into the business of doing all of the work (since there are so many approaches and each has its advantages/disadvantages).

@cwe1ss as far as the stack concept is concerned: in Dapper we basically had each in-process Context object keep a reference to its parent Context; when finish() was called (or its moral equivalent... the naming was different), the Context would re-install its parent Context in the thread-local storage. So there was essentially a linked list of Context objects within the process.

sjoerdtalsma commented 7 years ago

@bensigelman For what it's worth, I heavily depend on the the java.util.ServiceLoader to make my library pluggable. The abstract threadlocal implementation has the same 'linked stack' structure, except it tries to correct out-of-order closing as well.

Details on the out-of-order closing: If the closed context isn't 'current', current isn't touched. While the parent is already-closed, it is 'skipped' in the search for the restored 'current'. I am aware this should never happen in properly nested try-with-resources Autocloseable contexts.

Later today I'll be committing a 'java-globaltracer' project to github and planning to offer it to opentracing-contrib. There will be three main classes with only a single public class GlobalTracer (currently with 3 methods) following this design: http://bit.ly/2gkrRuo

wu-sheng commented 7 years ago

@sjoerdtalsma , can you post the site of java-globaltracer project. I have suggest earlier in https://github.com/opentracing/opentracing-java/issues/63. ServiceLoader to get a tracer maybe a good way. Even though, some tracer need arguments to init, like Jaeger Tracer by @yurishkuro told, they are not suitable for the module.

But I like this type of java tracer factory/manager very much.

sjoerdtalsma commented 7 years ago

@wu-sheng @adriancole , I pushed an initial attempt (with a working unit test using the MockTracer) to github yesterday evening: https://github.com/talsma-ict/java-globaltracer

I would love to donate it to the 'contrib' git repository and have it published in Maven central under 'io.opentracing.contrib' but only after a thorough review process. Since I'm relatively new to the opentracing community I have no experience on the best way to go forward from here...

wu-sheng commented 7 years ago

@sjoerdtalsma @adriancole , maybe we should create a new issue to discuss this?

I review the project generally, let's leave the detail discuss later.

But at first, jdk level should be fixed. Using Optional<T> led to needing jdk 1.8+, but many applications based on jdk 1.6+. I think you should implement your design on jdk 1.6, maybe the code is less grace, but it will easier to use.

codefromthecrypt commented 7 years ago

@sjoerdtalsma https://github.com/sjoerdtalsma @adriancole https://github.com/adriancole , maybe we should create a new issue to discuss this?

One of the reasons we discussed thread local was yuri's question of whether we can assume thread locals are used or not. as far as I can tell, we cannot make that assumption. If that steers the api in a particular way, then the examples were fruitful.

I agree that it is probably helpful to keep weeds or investigations on different threads as doing so here might be overwhelming. probably best in the actual repo of the code in question?

sjoerdtalsma commented 7 years ago

@wu-sheng @adriancole I welcome feedback on the design and implementation choices by github issues. https://github.com/talsma-ict/java-globaltracer I'm also willing to transfer this repository to https://github.com/opentracing-contrib if you think this contribution is worthwhile.

wu-sheng commented 7 years ago

@sjoerdtalsma , transfer to https://github.com/opentracing-contrib, LGTM. @yurishkuro @adriancole @bensigelman , what's your opinion?

Maybe accept it, discuss on the repo, and let's see what is going on later?

bhs commented 7 years ago

@sjoerdtalsma my preference would be to

create (empty, ASF2.0) repos on opentracing-contrib for the pieces you'd like to move over
do a PR for the initial contents where we discuss the finer points of the code itself (and I would def rather not try to do that all here in this thread!)
merge those PRs as soon as they're ready, etc.

wu-sheng commented 7 years ago

@bensigelman , can you add me to the https://github.com/opentracing-contrib org. I am interested in @sjoerdtalsma's project.

wu-sheng commented 7 years ago

@sjoerdtalsma , let's me know, after you create a repo, and do a PR

sjoerdtalsma commented 7 years ago

@bensigelman I agree on all three of your points. re. 1: I cannot create this repo in opentracing-contrib, so it would be great if anyone with the power to do so could create a new ASF2.0 repo called 'java-globaltracer' or something similar. I'll take care of step 2 from there.

wu-sheng commented 7 years ago

@bensigelman , maybe you should create a repo for @sjoerdtalsma

bhs commented 7 years ago

@sjoerdtalsma: https://github.com/opentracing-contrib/java-globaltracer/invitations gives you admin permissions... maybe we can start with a PR of the existing code?

bhs commented 7 years ago

mabn commented 7 years ago

@cwe1ss I think the only requirement is "I need to get the current span"

@yurishkuro there's also the logical requirement to "set the current span"

One thing I don't want is having to create a second instrumentation for non-tracing context propagation. For example - imagine an http service which authenticates a user. It obtains a User instance. It's natural to want to access this instace from within the request processing "flow" and so it has to propagate (implicitly or explicitly) across various calls, just like tracing context.

Another good example is having per-request cache.

Right now opentracing api does not support this - tags do not propagate and baggage values are strings (and propagate to other services).

So: I'd like a way to to store objects in some process-scoped baggage / request-context

Second thing that I believe is important is playing nicely with MDC from logging frameworks. Explicit support is not needed, but it should be convenient.

I know that one could say that tracer implementations should handle MDC integration and "request context" could be just a Map with trace-id as the key, but it should be considered.

JonathanMace commented 7 years ago

Hi all,

Just dropping in here to post a message that I also posted to the Google Group. I've recently been working on "Baggage Buffers" with Rodrigo at Brown. Baggage Buffers, and more broadly the Tracing Plane, is the culmination of our ideas related to distributed tracing, metadata propagation, and Baggage.

I've just finished the end-to-end implementation and I'm working on testing, tutorials, examples, and documentation. However, I thought I'd just jump the gun and let you all know about this now, because some of the concepts are pretty cool (baggage buffers itself, as well as the atom layer and baggage protocol).

https://github.com/JonathanMace/tracingplane

An actual release is going to follow in February, along with a whitepaper and research paper (plus much more extensive documentation and examples).

============

On a separate note, I'm annoyed that I missed this conversation and I will read through it and provide some comments. Baggage Buffers supports process-scoped baggage with 'transient' datatypes and more generally, scoping for when variables are propagated and not propagated.

wu-sheng commented 7 years ago

@JonathanMace , seem several interesting things. But a little hard for me to understand all (My English may be not good enough). Still, look forward your Getting started, Simple example and Tutorial.

Thank you for your sharing.

bhs commented 7 years ago

I'd like to move this conversation along... I think this issue is the most pressing technical concern for OpenTracing as a project right now and I'm hoping to move the discussion forward a bit in the coming week or two.

So, since the last activity in this thread, @sjoerdtalsma has merged in his https://github.com/opentracing-contrib/java-spanmanager work. Per the README,

Warning: This library is still a work in progress!

That said, he put a bunch of thought into it, and it makes an interesting reference point for the discussion here. Separately, I've been thinking about how we addressed this problem at Google and remembering what I liked and did not like about it.

TL;DR, for OpenTracing to achieve its goals, it's nearly a requirement for a programmer to start a new Span without a local var that identifies its parent. Otherwise there is this annoying handoff problem between API users and API implementors if said API is going to support OpenTracing. (There are approaches like https://github.com/opentracing-contrib/java-jdbi/blob/master/src/main/java/io/opentracing/contrib/jdbi/OpenTracingCollector.java#L185, but they're heavyhanded and still don't really solve the problem except for trivial situations where thread-local storage is sufficient)

Now, in order to satisfy the above (near-)requirement, OpenTracing needs to provide:

access to an appropriate Tracer
some way of accessing a parent SpanContext to reference in any newly-started Span

I would note that many existing solutions in this arena provide access to the active Span, not just the SpanContext... IMO that's more than is necessary and also creates complex corner-case lifetime issues around Span.finish() (remember that SpanContext objects are by-definition good-for-life and don't expire on Span.finish()).

Now, we could make an "official" OpenTracing Futures library, an "official" OpenTracing executor service, etc, etc... and maybe we will someday. But I would love to keep those things out of the core API and make them opt-in. That suggests some way to make the active-SpanContext-management code pluggable somehow.

Remembering that we're in the specification repo rather than the java repo, I don't want to overfit to Java... but it is helpful to make this concrete. So here's a rough sketch of a suggestion here:

Introduce something like an ActiveSpanContextProvider interface. It would have one method, get(), which would return the active SpanContext, or null if none is known
Introduce a SpanLifetimeObserver interface. It would have callback methods to inform about new Span start and finish events, but might only find out about the SpanContext
Tracer gets an addActiveSpanContextProvider method, which requires an ActiveSpanContextProvider and a SpanLifetimeObserver implementation
things like startSpan would have some sort of behavior where they would look through the SpanContext providers for a possible parent (not sure how to handle multiple parents if at all?)

Now, a Futures framework or executor wrapper can bind to any OpenTracing impl via these two interfaces. Users would have to make that binding at init-time, but then in their application code they could basically just write something like globalTracer.buildSpan("foo").start() and OpenTracing could fall back on the ActiveSpanContextProvider to figure out the parent/child relationship.

Thoughts?

PS: If the above is hard to understand I can write a gist or something. Mainly just doing a braindump at this point.

wu-sheng commented 7 years ago

First, on most sides, I support you about this. ActiveSpanContextProvider and SpanLifetimeObserver are good interface. 👍 +1 on adding these to spec 2.0

And I have to say, these interfaces are very google-naming-style. 😜

We can have more discussion about this feature.

First of all, Tracer have methods to add ActiveSpanContextProvider and SpanLifetimeObserver. And SpanBuilder has asChildOf(SpanContext spanContext), I am not sure you want to remove this method, because you mention

But I would love to keep those things out of the core API and make them opt-in. That suggests some way to make the active-SpanContext-management code pluggable somehow

We will face two types of context, if the context-provider mechanism is active.

If we do like activespan-repo did, such asglobalTracer.buildSpan("foo").start() , we will need a set of new APIs. NewTracer's methods are static. This is not like a plugin, but like an extend-API.

At this point, providing these two interfaces just as plugin, seems have a little problem. Hopefully, you already have some solutions on these.

mabn commented 7 years ago

1 SpanLifetimeObserver should also have callbacks for activation and deactivation of a span. It's required for example to fill thread-locals when a new threads starts execution in the context of existing span and for subsequent cleanup.

2 I'd still want support for "request context" (like baggage, but does not propagate outside of a single process and values can be arbitrary objects). I made some attempts to implement it and at least following is required:

lifecycle callbacks for span start and activation
at least "start" callback must have access to the parent span (or parent span context)

Then it can be achieved with a separate "manager" which tracks the hierarchy of spans and keeps a mapping between spans and "request contexts"

But it would be great to add "request context" interface to the API.

3 Java-specific: I played a bit with MDC integration. The tricky scenario is: P and C denote parent and child threads, p and c parent and child span

P: activate(p)
P: MDC.put("x", 1)
P: spawn C
P: MDC.put("y", 2)
C: c = child_span_of(p)
C: activate(c)
C: assert MDC.get("x") == 1
C: assert MDC.get("y") == 2

Here both the start and activation happen in the Child thread and it's too late - there's no way to access MDC from the Parent thread. I'm not sure how to solve it without replacing underlying MDC implementation (MDCAdapter in case of Slf4j). It's easy if child span starts in Parent thread (it's possible to create a copy of the current MDC and later set it in Child thread)

bhs commented 7 years ago

@mabn a few thoughts:

SpanLifetimeObserver should also have callbacks for activation and deactivation of a span

I agree that such an observer needs to know about those events, but I don't think the Tracer would be the one delivering them... e.g., the Executor wrapper itself would intrinsically know about activation and deactivation.

... it would be great to add "request context" interface to the API.

This one is hard for me. On the one hand, I agree with you about its elegance and so on. On the other hand, OpenTracing gets dragged over the coals for solving (or trying to solve) too many problems at once. (E.g., @adriancole's refrain about whether logging needs to be so general-purpose, or this issue from OT-Go) Then you have things like @JonathanMace's https://github.com/JonathanMace/tracingplane , which is kind of like what you're describing and much more.

My personal preference would be to remain aware of this sort of functionality as we design APIs and attempt to be roughly forwards-compatible (if not from a strict compilation standpoint, at least conceptually / holistically).

... MDC integration.

Yeah, MDC is one of those almost-but-not-quite things for me. I also don't know how to make it work in the general case without reimplementing.

sjoerdtalsma commented 7 years ago

I think the API should limit itself to Tracing and not try to include 'foreign' features such as RequestContext or MDC etc. That creates a huge (and expanding) scope for this. Ideally, the API should easily integrate with other context propagation mechanisms.

Purely as example, I added Span propagation to my existing library, unfortunately I didn't have time for documentation or examples yet, but if you're open to looking at code here's an example with propagation of Context[ServletRequest] and Context[Span], the latter delegating to any SpanManager it can find on the classpath.

sjoerdtalsma commented 7 years ago

Then you have things like @JonathanMace's https://github.com/JonathanMace/tracingplane , which is kind of like what you're describing and much more.

Very interesting concept! I'd love to integrate with something as generic as that. However, maybe OpenTracing should merely provide the API's that allow this integration and leave the integration up to the chosen solution?

bhs commented 7 years ago

@sjoerdtalsma

I think the API should limit itself to Tracing and not try to include 'foreign' features such as RequestContext or MDC etc. That creates a huge (and expanding) scope for this. Ideally, the API should easily integrate with other context propagation mechanisms.

That's essentially what I'm trying to get at here. OpenTracing should try to stay out of the business of "picking winners" for context propagation mechanisms. There are too many, and they have too many users. However, it would make instrumentation a lot easier to write if that binding could be decoupled from the actual line-by-line instrumentation code... i.e., if the binding could happen at main() or ServiceLoader init time. The proposed provider interface is a very rough attempt to get at something like this.

This week I hope to spell out a complete working example of how the binding would actually look in Java... maybe one for MDC, one for vanilla thread-local storage, and one for an executer wrapper?

pauldraper commented 7 years ago

maybe one for MDC, one for vanilla thread-local storage, and one for an executer wrapper

If you do MDC, realize there are two flavors: inherited (Log4j and old Logback) and non-inherited (new Logback). The latter is the most sensical.

I'm very happy with java-spanmanager. My only recommendation would be to be able to set the global SpanManager, like with java-globaltracer.

sjoerdtalsma commented 7 years ago

@pauldraper

I'm very happy with java-spanmanager. My only recommendation would be to be able to set the global SpanManager, like with java-globaltracer.

Glad it's useful to you! I have something set up in my own repo like that: GlobalSpanManager. Feel free to create an issue or pull request on java-spanmanager to discuss this in more detail.

tedsuo commented 7 years ago

Hi y’all! I’ve been thinking about this issue with @bensigelman. I think it might be helpful to break this problem down into two pieces – User APIs and Implementor APIs – and to sum up some conversations that have been happening in various threads. My main point is to try and draw a line between APIs that are used by applications and libraries (call them User APIs for now), vs APIs that are used by the writers of tracers and context managers (call them Implementor APIs). Hopefully this is a helpful framing of the context management problem.

APIs for Implementors

For an API to be an Implementor API, it must only be called during tracer initialization, or internally between context managers and tracers. The tracer should not expose these APIs in any way.

Things that happen only at init or within the internals of a Tracer have some nice properties:

Can be part of opentracing-java, but are less critical to get right as they can be changed later without world-destroying effects. These API changes would only affect calls at application init and and not the zillions of instrumentation call sites within applications and libraries. Any code for in opentracing-java for maintaining the relationship between tracers and context managers, context management fallback, etc, can be improved while only affecting tracer and context manager authors, not application/library developers.
Can instead be specific to the tracer implementation, since you still have a handle to the tracer implementation at that point. Multiple tracers can consolidate on APIs for context management – because why reinvent SpanPropagatingExecutorService N times – but have room to reinvent the wheel if they have some fancier or experimental way of doing things.

We still have to solve these problems for implementors, but given how wide the topic of context management is, and that it may continue to grow with the future of Java, there is a lot to be said around keeping these APIs inaccessible from the outside, past Tracer initialization.

APIs for Users

For normal instrumentation, you can wrap all this up and allow the user to call the existing tracer.buildSpan().start() off of a singleton-like tracer object. Indeed, that’s that goal!

The only context management that needs to be exposed via the Tracer API is whatever is needed to allow the user to do surgery around the edge cases, such @adriancole’s (example)[https://github.com/opentracing/specification/issues/23#issuecomment-264871406] of manually reactivating the span when RxJava loses track of it. So, potentially this new public Tracer API is smaller than the one the Tracer may need internally to interact with context managers.

Given these constraints, is the following additional API on the tracer sufficient for this kind of user land surgery? This appears to cover all the use cases I have seen, but I may be missing something important. I’m calling this ActiveSpanProvider only because I had to pick a name. Also, I’m going with Span, not SpanContext, as I believe that debate has been settled.

public interface ActiveSpanProvider {
  Span activeSpan();

  // caller must replace their span reference with the returned span
  Span activateSpan(Span span); 

  void deactivateSpan(Span span);
}

public interface Tracer extends ActiveSpanProvider {
// … existing API … //
}

Again, hopefully this framing was helpful. Also, I’d love feedback, especially examples of this “User API” being insufficient.

wu-sheng commented 7 years ago

@tedsuo @bensigelman , First, I love the User APIs and Implementor APIs point. This will be very helpful. 💯

I assume, do you want to remove the SpanContext from **User APIs? So, users can usetracer.buildSpan().start()to start a new span directly, and do not need care aboutSpanContext`.

And I have to say, I have a little concern about ActiveSpanProvider. Maybe later, you can explain more about activateSpan and deactivateSpan, and give me some demos.

Another thing is, after we separated these two pieces, what is opentracing-java responsibility? opentracing-java is for User APIs, and opentracing-tracer-java is for Implementors APIs?

mabn commented 7 years ago

@tedsuo

I think User API is insufficient as it does not allow to instrument Slf4j's MDC. The method names are fine, but the fact that they deal with "Span" is too limiting.

For MDC the instrumentation of a Runnable could look like this:

  static class InstrumentedRunnable implements Runnable {
    private Something context;
    private ActiveSpanProvider provider; 
    private Runnable wrappedRunnable;

    public InstrumentedRunnable(ActiveSpanProvider provider, Runnable wrappedRunnable) {
      context = provider.beforeActivation();
      this.wrappedRunnable = wrappedRunnable;
    }

    @Override
    public void run() {
      provider.activateSpan(context);
      try {
        wrappedRunnable.run();
      } finally {
        provider.deactivateSpan(context);
      }
    }
  }

The important part is that there are 3 places where libraries need to interact with the tracer and instrumentation. Let's consider using above InstrumentedRunnable to execute something in another thread. MDC instrumentation would be interested in:

capturing current state in original the thread/fiber/actor - beforeActivation in above example. MDC instrumentation would run MDC.getCopyOfContextMap() at this point
setting this state in the new thread/fiber/action - that's what activateSpan does. In case of MDC it is MDC.setContextMap(capturedState). In case of a span it could be provider.setActiveSpan(...)
cleanup - deactivateSpan

As you can the methods required to instrument MDC are pretty similar, but they cannot deal with Spans but rather some object which contains active Span and additional "contexts" from all hooks/plugins which are interested in participating in the activation flow.

This is the case not just for MDC but in general for any functionality which maintains state related to a subset of spans - e.g. a request context, or some thread-local state.

tedsuo commented 7 years ago

Thank you for the example @mabn, it’s very helpful. Using a global Tracer with the ActiveSpanProvider API I proposed, your example could be written as follows:

  static class InstrumentedRunnable implements Runnable {
    private Span span;
    private Runnable wrappedRunnable;

    public InstrumentedRunnable(Runnable wrappedRunnable) {
      this.span = GlobalTracer.get().activeSpan();
      this.wrappedRunnable = wrappedRunnable;
    }

    public InstrumentedRunnable(Span span, Runnable wrappedRunnable) {
      this.span = span;
      this.wrappedRunnable = wrappedRunnable;
    }

    @Override
    public void run() {
      tracer = GlobalTracer.get()
      span = tracer.activateSpan(span);
      try {
        wrappedRunnable.run();
      } finally {
     tracer.deactivateSpan(span);
      }
    }
  }

Internally, the tracer has associated the span with the contents of MDC.getCopyOfContextMap when the span was created. On Tracer.activateSpan the Tracer calls MDC.setContextMap with this context. So the machinery for managing the relation between the Tracer and MDC/SomeContextManagerAPI is not exposed to the framework developer. So, the tracer developer manages Context, the framework developer manages Activate/Deactivate, and the application developer manages Start/Finish.

Thoughts?

tedsuo commented 7 years ago

Hi @wu-sheng , glad you like the User/Implementor distinction. Just to be clear, my comments above are only related to context propagation, I’m not proposing a huge change to the existing API.

The current tracer can be accessed from the environment (AKA a globalTracer).
The tracer includes an ActiveSpanProvider interface.
Internally, the tracer interacts with context managers.
Activate/Deactivate is pulled out from Started/Finished. Start and Finish are states on a Span, Active/Inactive describes the relation between a span and the current context.
There was some debate about whether context managers and tracers should use Spans or SpanContexts when they interact with each other. SpanContexts looked attractive because they are immutable, but their current API is too restrictive, and we don’t want to extend it. So we should use Spans instead. I'm not recommending we get rid of SpanContext in other places.

My main point about User/Implementor is just some framing: we have Tracer developers, Framework developers, and Application developers, and their roles and APIs should ideally not bleed into each other. The Application developer interface is the most important to standardize, followed by the Framework developer, due to the number of call sites those APIs represents compared to internal Tracer APIs.

wu-sheng commented 7 years ago

I see your point. Distinction for Tracer developers, Framework developers, and Application developers, which are three important roles, when we build a trace eco-system.

The codes like these (from your demo), make easier to read/write (ignore span or context thing, right now). I hope OT can have more improvements on that.

@Override
    public void run() {
      tracer = GlobalTracer.get()
      span = tracer.activateSpan(span);
      try {
        wrappedRunnable.run();
      } finally {
     tracer.deactivateSpan(span);
      }
    }

Only for my personal interest, I think only Tracer developers should focus on context; Framework developers, and Application developers only concern about context propagation when cross thread/process ( Maybe inject/extract mechanism ). This will make all things easier to use/understand.

mabn commented 7 years ago

@tedsuo I thought about it - hiding the "context" behind the Span interface - and it would work, but it would also cause unnecessary overhead. Here's why:

(I'll use MDC example)

To extract a snapshot of MDC context some action must be performed in the current thread (MDC.getCopyOfContextMap()) - it's not possible to change internals of MDC and have access to all contexts any time (it is possible with some reflection magic, but let's not go there).
The only place to do that with the proposed API is the activeSpan() method.
But I suspect that activeSpan will be called pretty often for reasons other than to jump threads. E.g. to add tags or logs to the active span. There's no way to indicate the purpose so every call will need to extract MDC context and wrap it into a new Span instance (it has to be a separate instance because thread safety).

On the other hand e.g. logback maintains MDC context as copy-on-write-map so taking a "snapshot" is basically free.

wu-sheng commented 7 years ago

every call will need to extract MDC context and wrap it into a new Span instance

@mabn , why need wrap it into a new Span, every time? IMO, it just builds a new span, based on the given context( whatever who manage it ), this should be on the Tracer APIs head.

I think the point is not hide context, it is about whether need to know/use the context on the User APIs side.

opentracing / specification

Standard(s) for in-process propagation #23

APIs for Implementors

APIs for Users