opentracing / specification

A place to document (and discuss) the OpenTracing specification. 🛑 This project is DEPRECATED! https://github.com/opentracing/specification/issues/163
http://opentracing.io/spec
Apache License 2.0
1.18k stars 182 forks source link

What to do about Span.info() / Span.error()? #6

Closed opentracing-importer closed 8 years ago

opentracing-importer commented 8 years ago

Issue by yurishkuro Sunday Jan 17, 2016 at 21:03 GMT Originally opened as https://github.com/opentracing/opentracing.io/issues/30


To fork from a different discussion in #26, the questions here are:

  1. should we expect the message argument to info() and error() methods to be parameterized by referring to portions of the payload, just like the logging APIs do?
  2. if so, do we converge to a single cross-platform syntax or use common patterns from the respective platforms, i.e. span.Info("oops we found %+v", payload) in Go vs. span.info("oops we found {}", payload) in Java?
opentracing-importer commented 8 years ago

Comment by adriancole Monday Jan 18, 2016 at 05:18 GMT


I'll note that we only have this discussion because we decided to partially implement a logging api. In java, there isn't a single answer, either. Some logging is printf format, others are different.

opentracing-importer commented 8 years ago

Comment by bensigelman Monday Jan 18, 2016 at 22:01 GMT


@adriancole noted re the discussion itself... no argument that we're incurring a cost by discussing logging, though IMO it is a worthwhile one given the (tangible, demonstrated) benefits of span-granularity logging data at google and in my conversations about tracing with numerous other companies in the past year or so.

opentracing-importer commented 8 years ago

Comment by bensigelman Monday Jan 18, 2016 at 22:08 GMT


@yurishkuro upon some reflection, here's a concrete proposal... the goals are to present people with platform-idiomatic and straightforward signatures for vanilla message logging, the opportunity to easily mark errors vs info (but nothing more from a severity standpoint), to allow for the logging of potentially large payloads (which may be truncated or ignored by the Tracer), and – if possible – to also support "event"-style logging, where it's still a string at the end of the day, but more of a stable string that marks a specific semantic moment... e.g., cr.

Possibility A (combine event and payload):

Possibility B (split out event and payload):

Possibility C (don't support event):

Possibility D (don't support message logging):

Possibility E (don't support message logging and combine event and payload):

Possibility F (don't support message logging or payloads):

My favorite is A.

opentracing-importer commented 8 years ago

Comment by yurishkuro Monday Jan 18, 2016 at 23:30 GMT


@bensigelman My preference would be A as well (no surprise there). It still leaves us with a task of clarifying the <platform-idiomatic params and message formatting>, since there may not be a single answer on some platforms (like Java, as @adriancole mentioned). We may need to just make a choice, e.g. for Java use slf4j formatting syntax.

One other question with option A is handling exceptions in the error() method. I've ran into that already while doing instrumentation work. The simplest approach I used so far was to assume that the tracer can recognize the exception automatically as one of the payload entries.

opentracing-importer commented 8 years ago

Comment by adriancole Tuesday Jan 19, 2016 at 02:50 GMT


Possibility F (don't support message logging or payloads):

  • event(String)

I have a bias towards supporting existing ops like Trace.record, Span.annotate, etc. in v1.0 in the most straight-forward means possible.

add_event(String, timestamp?) or similar meets that requirement. if "log" didn't conflate with classifying as "info" (as it might not be!!) I wouldn't mind log(String, timestamp?)

opentracing-importer commented 8 years ago

Comment by bensigelman Tuesday Jan 19, 2016 at 05:55 GMT


@adriancole re your suggestion of log(String, timestamp?): does it break Zipkin (or the Zipkin UI) if people log long, human readable, often unique strings via an API like this? I'm just trying to understand why we would have reason to support (and explain) two different logging-like APIs (add_event and info/error) that both just record a String and a timestamp. Or is it just conceptual? That things like cs and cr are so semantically important that they shouldn't get lost in the fray with info spam?

opentracing-importer commented 8 years ago

Comment by adriancole Tuesday Jan 19, 2016 at 06:19 GMT


@adriancole https://github.com/adriancole re your suggestion of log(String, timestamp?): does it break Zipkin (or the Zipkin UI) if people log long, human readable, often unique strings via an API like this? I'm just trying to understand why we would have reason to support (and explain) two different logging-like APIs (add_event and info/error) that both just record a String and a timestamp. Or is it just conceptual? That things like cs and cr are so semantically important that they shouldn't get lost in the fray with info spam?

Zipkin queries aren't designed for all types of strings, but you can store any string subject to length constraints. It will probably lead to larger indexes, but that's also tunable.

The most common case is searching for annotations like as "finagle.retry", "namer.success" and "namer.closed". You can query for "mary had a little lamb", but probably not a multi-line string representation of a stack trace, even if it can be stored and displayed on the timeline.

Hope this helps

opentracing-importer commented 8 years ago

Comment by adriancole Tuesday Jan 19, 2016 at 06:25 GMT


Or is it just conceptual? That things like cs and cr are so semantically important that they shouldn't get lost in the fray with info spam?

Conceptually, replacing something that evaluates to Trace.record("thing.failed") with Span.info("thing.failed") makes no sense. There's no notion of whether something is "error" or "info" in zipkin, and if we actually desire people to use it, there should be a drop-in for things that don't classify as info or error. This isn't just zipkin! look at all open source tracers, and you'll find this classification of timeline events is not a widely implemented feature even if it is at Google.

opentracing-importer commented 8 years ago

Comment by bensigelman Tuesday Jan 19, 2016 at 06:34 GMT


Got it... Yeah, I'm not suggesting it makes sense; I just wanted to make sure I understood the implicit objection behind the comment you made earlier (about wanting to use .log() for events, yet not being able to because of something like .info()).

Ben (mobile)

On Monday, January 18, 2016, Adrian Cole notifications@github.com wrote:

Or is it just conceptual? That things like cs and cr are so semantically important that they shouldn't get lost in the fray with info spam?

Conceptually, replacing something that evaluates to Trace.record("thing.failed") with Span.info("thing.failed") makes no sense. There's no notion of whether something is "error" or "info" in zipkin, and if we actually desire people to use it, there should be a drop-in for things that don't classify as info or error. This isn't just zipkin! look at all open source tracers, and you'll find this classification of timeline events is not a widely implemented feature even if it is at Google.

— Reply to this email directly or view it on GitHub https://github.com/opentracing/opentracing.github.io/issues/30#issuecomment-172753771 .

Sent via mobile

opentracing-importer commented 8 years ago

Comment by bcronin Tuesday Jan 19, 2016 at 19:36 GMT


(Long thread here, so my apologies if I missed anything...and apologies for making it longer!!!)

I want to share my thoughts, which I think are most similar to bhs' "Possibility A" -- but a little different in that it would introduce the notion of a rawLog method as well. Anyway, the below is basically centered around the idea that info/error/event are all, in the data model, pretty much the same thing -- but there's a lot of variation in how best to make a convenient API around it. With that in mind, the spec could state something like the below:


The OpenTracing platform implementations are required to support a rawLog() method that takes a set of optional arguments that allow a log record to be completely specified. (In Go, this is likely an LogOptions object. In JS, this is likely a associative array of fields. In Python, this is likely keyword args.) E.g. in JavaScript:

spanHandle.rawLog({
    timestamp : <time_in_micros>,
    type      : <constant indicating INFO, ERROR, or EVENT>,
    message   : <a post-formatted message string || an event name>,
    payload   : <payload object>,
});

(or perhaps...)

spanHandle.rawLog({
    timestamp  : <time_in_micros>,
    event_name : <stable string name>,
    message    : <a post-formatted message string>,        
    error_flag : <boolean>,
    payload    : <payload object>,
});

Any implementation of OpenTracing must be able to gracefully handle any combination of the options, though it is allowed to ignore data it does not consider valid (e.g. lack of a timestamp or, say, both an event_name and message being specified).

This rawLog API is intended to be fully flexible and account for such use cases as needing to manually specify the timestamp. The full flexibility should help stave off a proliferation of separate methods for uncommon but necessary use cases. As a downside, this API will often be inconvenient to use directly for the common case.

To address the need to make the common case convenient, the following APIs should be supported in a manner that's most idiomatic to the platform (to be explicit: this does mean per-platform variations in the signatures of the below are allowed if it promotes convenience for the platform in question).

In terms of promoting ease-of-use, the following sort of variations are allowed. This is not an exhaustive list:

As the exact signature and behavior of info, event, event are allow to vary by platform, each platform implementation is expect to document how these calls would translate to a direct call to rawLog.


That's my two-cents. It was a bit on the thinking aloud side of things, so again apologies if this is missing anything fundamental!

opentracing-importer commented 8 years ago

Comment by yurishkuro Tuesday Jan 19, 2016 at 20:31 GMT


@bcronin I would be ok with this (maybe s/rawLog/log)

@bensigelman to clarify the difference between event(str) and info(str) - the intention is that the arguments to event() are not only low-cardinality, but are also standardized with clear semantic meaning, similar to standardizing on what the tag name should be for http.url or http return code. The cs/cr annotations are not actually the primary use case for event(), as the user never needs to set these when using OT, but more for the other "standard" events known to Zipkin like wire send/recv, fragements, etc. And the expectation from the tracing system is that it might do something special based on events that it wouldn't do purely based on info logs, e.g. index/aggregate them.

opentracing-importer commented 8 years ago

Comment by bensigelman Tuesday Jan 19, 2016 at 20:39 GMT


@yurishkuro makes sense re the mental model for event()... So these are basically important moments in the lifetime of a Span that don't really justify a sub-span (either because they're incidental or because they're, well, "momentary": they have a timestamp rather than a time range). @bcronin was using the example of a span that represents a page load in a browser, and the idea of events for the various (performance timing keys)[https://developer.mozilla.org/en-US/docs/Web/API/PerformanceTiming].

I also like @bcronin's proposal (very much, in fact)... I am doing a bunch of non-coding work right now but later today I will try to send out a slightly more concrete (but still RFC-status) PR for the golang API and reference it on this thread.

opentracing-importer commented 8 years ago

Comment by bensigelman Tuesday Jan 19, 2016 at 21:53 GMT


Per ^^^, here's a PR illustrating what this might look like in Go for starters:

https://github.com/opentracing/opentracing-go/pull/38

opentracing-importer commented 8 years ago

Comment by dkuebric Wednesday Jan 20, 2016 at 00:09 GMT


Thinking about the Error use-case:

opentracing-importer commented 8 years ago

Comment by bensigelman Wednesday Jan 20, 2016 at 05:51 GMT


@dankosaur thank you for the feedback. A few followups:

Re bullet point 1: I'm not sure I fully understand what you're suggesting... can you express it in code/pseudocode (can be non-Go if you want, whatever works)?

Re bullet points 2+3: we could replace IsError with Severity, an enum that encompasses N verbosity levels as well as WARNING and ERROR...

And a note about Payload: I was thinking that it would basically always be application data, perhaps massaged or munged a little bit... but I wasn't thinking that it would be used to add context/flags (what you call the "semantic API") about the log message itself (severity bits, etc, etc). I'll mention this in the comment and update the PR.

opentracing-importer commented 8 years ago

Comment by dkuebric Wednesday Jan 20, 2016 at 15:51 GMT


@bensigelman re: question/suggestion in bullet 1, I was thinking about the following use-case, say in Python:

except SomeExceptionType, e:
  tracer.Error(event=e.__class__, message=e.message)

It's nice to be able to group on the class of an error, not just the message, which may contain unique strings that make it impossible to group. For that reason, I suggested extending the Error interface to allow an Event usage, since Event is in our struct anyway. But re-reading your gist of Event, this might be pushing it towards a use-case that would be somewhat different from how it would be used in non-error events (higher cardinality, at least), so this is a question of consistency balance vs capability perhaps.

re: the Payload note -- that makes sense, it seems as though your intent was to have most/all "semantic API" come from the Event strings, and Severity for errors? This might be enough for point-in-time events; admittedly I don't have any top of mind use-cases aside from errors.

opentracing-importer commented 8 years ago

Comment by bensigelman Thursday Jan 21, 2016 at 04:46 GMT


@dankosaur

re: the Payload note -- that makes sense, it seems as though your intent was to have most/all "semantic API" come from the Event strings, and Severity for errors? This might be enough for point-in-time events; admittedly I don't have any top of mind use-cases aside from errors.

Yes, that's what I had in mind... In your python example, I guess I would have imagined something like tracer.Event("exception: " + str(e.__class__), payload=e) (or so... maybe message gets its own field in payload, etc).

opentracing-importer commented 8 years ago

Comment by bensigelman Thursday Jan 21, 2016 at 23:32 GMT


There haven't been any marked objections to the proposal outlined in https://github.com/opentracing/opentracing-go/pull/38 .

If we have converged, I can update the semantics doc accordingly and close this issue.

opentracing-importer commented 8 years ago

Comment by michaelsembwever Thursday Jan 21, 2016 at 23:56 GMT


Please don't introduce the info/error methods. The introduction of a logging api seems contrived to one specific use-case that has come up.

That said, I think this use-case demonstrates the need for the 'payload'. I am therefore in favour of option (E).

(I wrote this quickly in lei of a longer comment, because of your previous latest comment @bensigelman )

opentracing-importer commented 8 years ago

Comment by bensigelman Thursday Jan 21, 2016 at 23:58 GMT


Sorry, which "one specific use case"? This sort of logging is totally garden-variety behavior within Dapper (and was widely used because it is essential when trying to make sense of a trace).

opentracing-importer commented 8 years ago

Comment by michaelsembwever Friday Jan 22, 2016 at 00:04 GMT


Fair enough. "some specific use cases" then.

For the initial OpenTracing API i'd argue for the most simple plumbing api possible. This is the KISS principle, and i think it's especially relevant here when we're talking about api design. API design is about evolution of the api, not about trying to get that api correct. It is easy to add the porcelain API (info/error/etc) later on if-needed, as opposed to trying to fix it later. (We should also be looking at separation of API and SPI, and separation of plumbing and porcelain.)

Option (E) seems to be the simplest solution, and an appropriate plumbing API, that solves all the use-cases raised.

opentracing-importer commented 8 years ago

Comment by adriancole Friday Jan 22, 2016 at 00:07 GMT


The proposal has quite a few methods and clearly looks like a logger.

My comments about requiring a span to also be a half, or, now full logger haven't changed. It is a lot more responsibility to take on.

No one else who is participating seems to see adding a logging interface to the duties of all OpenTracing impls a concern. Resistance at this point is a fools errand for me.

All that said, I am glad that it is possible to have something not muck with existing use-case of timestamped events.

Buried in the new methods is a way to do simple string, event + timestamp. This matches use in a lot of tracers. This is progress and I'm grateful for that.

opentracing-importer commented 8 years ago

Comment by bensigelman Friday Jan 22, 2016 at 00:10 GMT


Resistance at this point is a fool's errand for me.

Hardly.

We could start with a thinner interface that just does event+payload logging with the understanding that we will most likely (but not definitely) add logging in a not-so-distant OT API. I can live with that if there is strong support for the more conservative approach.

opentracing-importer commented 8 years ago

Comment by michaelsembwever Friday Jan 22, 2016 at 00:20 GMT


We could start with a thinner interface that just does event+payload logging with the understanding that we will most likely (but not definitely) add logging in a not-so-distant OT API. I can live with that if there is strong support for the more conservative approach.

Well said. It's a pragmatic and incremental step that doesn't step on anyone's toes, while still keeping it possible for all platforms to do what they need to. I'd be more than, and i'd expect everyone to be, happy to re-approach the issue with a clean slate once we've had a little more experience with the plumbing api.

opentracing-importer commented 8 years ago

Comment by bensigelman Friday Jan 22, 2016 at 00:32 GMT


per the messages above, here is a separate RFC PR with a thinner event-and-payload-logging interface (for Go, just to make things concrete).

https://github.com/opentracing/opentracing-go/pull/39

opentracing-importer commented 8 years ago

Comment by adriancole Friday Jan 22, 2016 at 00:36 GMT


Agreed, I also be happy to revisit logging in the future. It would be helpful to not conflate evangelizing OpenTracing and the work to migrate it with evangelizing a logging api (and the work to support it) as a single step, even if they could make sense separately.

opentracing-importer commented 8 years ago

Comment by bensigelman Friday Jan 22, 2016 at 01:02 GMT


Okay, maybe that's where we should start. This topic seems to raise hackles and – more than other stuff like the basic propagation logic – I get the impression that logging can be safely added later.

opentracing-importer commented 8 years ago

Comment by michaelsembwever Friday Jan 22, 2016 at 02:49 GMT


Next question: is the payload

opentracing-importer commented 8 years ago

Comment by bensigelman Tuesday Jan 26, 2016 at 19:44 GMT


https://github.com/opentracing/opentracing-python/pull/11 has been merged, so both Go and Python are now restricted to events and payloads while we sort out more important matters relating to propagation.

opentracing-importer commented 8 years ago

Comment by bensigelman Wednesday Feb 24, 2016 at 21:43 GMT


RFC:

Thoughts?

bhs commented 8 years ago

IMPORTANT NOTE

The importer script doesn't seem to work well, and it missed a bunch of comments. I'm going to cut bait and just create manual redirects :-(