Remove service tag and replace with a similar concept on Scope.

tylerbenson commented 5 years ago

Background

A while ago, we added service as a tag (#119). This is overly simplistic and has high probability for user confusion.

Problem

The problem is we scoped it as a tag specifically for a narrow use case, but as implemented, it is trivial for users to use it in ways that can fundamentally break common tracer logic.

Proposal

I propose we roll back/remove service as a tag, and move the ability to override the service name to the Scope.

Perhaps something like this (in java parlance):

Span span = tracer.buildSpan("stuff").start();
try (Scope scope = tracer.activate(span)) {
  try (Scope newScope = scope.withServiceName("foo")) {
    // newScope is now active scope and service name is overridden for any new children spans.
  }
  // new child spans without service name overridden here.
}

This also opens the possibility of additional Scope scoped tags or other things.

Consideration

I don't believe any releases have been published with the new service tag included, so at a minimum, we should at least revert #119 and https://github.com/opentracing/opentracing-java/pull/287 before they are released to prevent the need for a longer tail deprecation.

codefromthecrypt commented 5 years ago

Thanks @tylerbenson. I agree the tag convention should be reverted.

I like that this won't accidentally result in a tag "service" which would be hard to grok whether it was an accident of legacy instrumentation or a new thing. It also addresses the first-class nature of service in distributed tracing. Finally, it deals with scoping properly.

One thing I would suggest is keeping this feature on the tracer as opposed to overloading the scope with more methods.

Ex

Span span = tracer.buildSpan("stuff").start();
try (Scope scope = tracer.activateForService(span, "who-I-proxy-for")) {
   // newScope is now active scope and service name is overridden for any new children spans.
}

note the span itself here should be only assigned to one service, so this is not only the children but also the activated one itself. There might be some nuance to how this plays out.

Note: the above allows you to switch services multiple times: nested scopes. If that's a feature it should end up in the tests.

An alternative would be to not allow nesting and do something similar to baggage which is write-only. For example, adding a serviceName method to the span builder.

Ex

Span span = tracer.buildSpan("stuff").serviceName( "who-I-proxy-for").start();
try (Scope scope = tracer.activate(span)) {
   // newScope is now active scope and service name is overridden for any new children spans.
}

The pro of this is that it allocates the span from the beginning as that service. Also, it can't accidentally end up ambiguously as a "service" tag in old instrumentation. OTOH, this has the same implementation gotchas as baggage (Except the remote propagation part), even if it is only scoping a single property (service). That said, some tracers may already have a .serviceName method on their span which lets this drop in.

food for thought.. regardless, this should be tested in practice against something that proxies, and directly linked.

Test cases should include at least:

dispatch to multiple handlers based on incoming route (and those being different services)
multiple intermediate aka local spans
integration with at least one "real" implementation, to show service graph aggregation isn't hosed easily

yurishkuro commented 5 years ago

+1 to have this integrated with span creation rather than scope.

But the description could use better justification. The only one I heard is old instrumentation already using "service" tag for different purposes. I think it's a good enough reason (I wasn't a fan of the tag in the first place), but are there other considerations?

tylerbenson commented 5 years ago

@yurishkuro to be clear, you would prefer @adriancole's second suggestion, to see this associated with the span builder and have it propagated more like baggage?

yurishkuro commented 5 years ago

I was only referring to the builder part. Auto-propagating sounds like assuming too much, but I can see the rationale for that. Perhaps it also needs to be explicit.

tylerbenson commented 5 years ago

I don't think anyone is suggesting for it to propagate across processes, just referring to how baggage is shared between spans.

codefromthecrypt commented 5 years ago

In looking at prior art (like finagle) and also offline chat with @codingfabian also thinking about this..

I think the simplest design is requiring the tracer to inherit parent-child any time it is set. This can't be assumed in tags, not safely IMHO as no other data in tags act like that. I think the span builder is the best way to achieve this in the OT api, possibly named "spanBuilder.serviceName"

For folks who are already using the "service" tag convention (ex @codingfabian or @jeffbakerexpedia), we can treat it like any other data which is sortof "old school". For example, when tracers upgrade, if they see a tag named "service" they can try to re-forward it to the correct api. This would work at builder time for example, even if be a bit dodgy later.

Aside: to underpin this, a span.serviceName() or similar method would be needed in brave to properly inherit. We were to do this anyway, just waiting for rule of three anyway. For example, we knew camel use the forwarding stuff and currently hack by using different tracers. It needed to be a repeating pattern before escalating to an api. I can manage adding this there.. cc @oscerd

tedsuo commented 5 years ago

I like the idea of service being a more concrete concept in the spec.

One question on scope: the service tag was added as an optional way to implement a limited set of behavior - out of band reporting on multiple services. But the "service scope" comes up more broadly in tracing than just this usecase:

the service namespace is something users really care about. Every tracing system has a concept of "service name" and sometimes "service ID" which is configured in tracer startup.
service-level tags (tags that apply to every span emitted from a tracer instance) are useful, both internally as an index and externally as context available to other subsystems.

How do others feel about these broader usecases? If we're going to add serviceName as an explicit concept to a span, I feel like we should consider the implications beyond just out-of-band reporting.

LakerBaker commented 5 years ago

I am also +1 on doing this at span creation rather than scope. We have an internal use case at Expedia in which our build system is using Expedia's Haystack tracing system to visualize our CICD pipeline; we currently do this with the tag-based override but will change our code to support the new spec when it changes.

tylerbenson commented 5 years ago

@tedsuo What should be our path forward here?

mchandramouli commented 5 years ago

I agree with @adriancole to have serviceName as a first class citizen and not as a tag. Whether we call it as serviceName or applicationName to be generic, I prefer an API like what Adrian proposed above

Span span = tracer.buildSpan("stuff").serviceName( "who-I-proxy-for").start();

This forces serviceName to be a concrete member of the data model and not as a tag.

mchandramouli commented 5 years ago

On a side note, I would say the same for remoteServiceName instead of or in addition to peer.service tag. Though this need not be mandatory, having it helps with building better service graphs. There is a related discussion in Haystack https://github.com/ExpediaDotCom/haystack/issues/537

yurishkuro commented 5 years ago

Service name is insufficient. Jaeger uses Process (instance, resource, workload are the other possible names), which contains a service name and a set of tags that contain additional metadata about the workload (host name, IP, port, zone, environment, code version, etc.).

It is different from remote service (peer.service) because the current workload may not know much metadata about the peer service, and the tag itself is generally not that critical.

yurishkuro commented 5 years ago

I think the simplest design is requiring the tracer to inherit parent-child any time it is set. This can't be assumed in tags, not safely IMHO as no other data in tags act like that. I think the span builder is the best way to achieve this in the OT api, possibly named "spanBuilder.serviceName"

+1 to inheritance & builder method (but replace "name" with a more complex object).

tylerbenson commented 5 years ago

I submitted a couple PRs (see above) to try to revert this mess I made. I suggest we get these merged and proceed forward with a separate PR to add service name as a first class API.

yurishkuro commented 5 years ago

@tylerbenson I would prefer to make changes backwards compatible, i.e. deprecate rather than remove.

tylerbenson commented 5 years ago

@yurishkuro I don't think it needs to be deprecated in the java repo since it was added since the last release, so technically not a breaking change there. I would consider changing it to deprecated in the spec though with the assumption that means it probably shouldn't be added to language implementations if not already there.

yurishkuro commented 5 years ago

ah, if it never made it to a public release, then sgtm

opentracing / specification