opentracing / opentracing.io

OpenTracing website
https://opentracing.io
Apache License 2.0
116 stars 132 forks source link

Simplify and generalize Span.log API #96

Closed yurishkuro closed 8 years ago

yurishkuro commented 8 years ago

Related to earlier discussions, including #30, #90.

I would like to propose to replace the current log_event and log methods with a single structured logging method:

span.log(tag1=value1, tag2=value2, ...)

Why

This API addresses several different requirements that were previously raised:

  1. the current log_event can be achieved by log(event="..", ...)
  2. a generic, human readable log can be achieved by log(msg="...", ...)
  3. an externally provided timestamp can be supplied as a timestamp tag
  4. a structured payload can be provided via payload tag
  5. levels can be supported with level=info | error
  6. exceptions can be logged with exception=ex tag

    Implementations

Specific API will depend on the concrete language features.

Python

def log(**kwargs)

Go

Can use either a variadic function with alternating keys and values, similar to go-kit's API:

func log(keysAndValues ...interface{})

or a zero-allocation approach used in https://github.com/uber-go/zap:

span.log(
    zap.String("msg", "ohh crap"), 
    zap.Int("attempt", tryNum), 
    zap.Duration("backoff", sleepFor)
)

Java

Best format I can think of is a variadic method with alternating keys/values, similar to Guava's ImmutableMap.of(k1, v1, k2, v2, ...). It can also be overloaded to accept a Map<String, Object>.

Javascript

Pass a dictionary

span.log({"msg": "something"});

cc @bensigelman @dkuebric @michaelsembwever @adriancole

codefromthecrypt commented 8 years ago

Meant to reply earlier.. here it goes!

Main thing is that this moves more explicitly into runtime checks (as opposed to method signatures) for use cases such as dispatch. This moves the extensibility away from the api and into the implementation. This works better for languages that have pattern matching

Pardon the crap stuff below, but it would look something like below to dispatch based on keys in scala and clojure. Note in some cases, ex clojure, the input isn't required to be an association (ex it could be a struct):

Ex.

log match { case Map("level" -> "info") => doInfo

(let {:keys [msg level] :or {level "info"}} log)

From an interop with java language perspective, there's a few follow-ups that we'd need to consider if we went with a pattern approach as opposed to method dispatch:

bhs commented 8 years ago

My biggest conceptual concern is that this "kicks the can down the road" (or "kicks the can up the stack"??) as far as the semantic distinctions between these various logging use cases. I.e., it weakens the "instrument once" Tracer interchangeability story.

FWIW / side-note: what you are suggesting here is essentially equivalent to what go-kit does: https://github.com/go-kit/kit/blob/master/log/log.go#L18

yurishkuro commented 8 years ago

@adriancole

What is the type signature? Would the input be Map<String, Object>?

Yes, but Map is an alternative signature, the primary one could be a variadic method, or a set of overloaded methods like of(...) in https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/ImmutableMap.java. A builder pattern can also be used. Any way we turn it, Java simply doesn't have an efficient way of passing variadic arguments without memory allocation, except for overloading the log method with 2, 4, 6, ... arguments.

Maybe we should show how the common dispatch cases might be implemented?

First, the dispatch does not need to happen in Java. If we are talking about tracers, then all they need to do is pack the set of key/values from the log method into a span and let the backend deal with interpretation. The example of dispatch (with some pseudo-code):

interface LogTagHandler {
    handle(Object value, LogData log);
}

Map,String, LogTagHandler> handlers = ImmutableMap.of(
    "level", new LevelLogTagHandler(),
    "payload", new PayloadLogTagHandler(), // etc
);

Span log(Object... tagsAndValues) {
  LogData logData = new LogData();
  for( tag, value in tagsAndValue) {
     handler = handlers.get(tag.toString());
     if (handler != null) {
       handler.handle(value, logData);
     } else {
       logData.appendCustom(tag, value);
     }
  }
}

But again, this only needs to happen way down the line in the collection pipeline. The actual tracer implementation, where performance matters the most, does not need any dispatching.

@bensigelman

My biggest conceptual concern is that this "kicks the can down the road" (or "kicks the can up the stack"??) as far as the semantic distinctions between these various logging use cases. I.e., it weakens the "instrument once" Tracer interchangeability story.

Right now we have a pretty odd looking set of logging methods, covering only an odd subset of use cases. And we've been arguing for months before we even got those in. My concern is that we'll be stuck for a long time if we try to come up with clean semantic distinction, via method signatures, to the use cases outlined above. I think we'll have a much easier time converging on a common set of logging tags, just like we did for span tags, while having a simple and generic logging API. What I also like about it is the symmetry with the span tags. The API is straightforward and can be fixed for a long time, without worrying that v 1.0 tracer is incompatible with v 1.1 user code because we happen to change some argument in the log method. Lastly, strictly speaking, a structured logging API does not require any additional semantic distinctions, e.g. if you have a backend (like ELK) that properly indexes structured logs, then the end users can always find what they are looking for without defining semantic meaning of the tags. That's not to say that we shouldn't define some standard tags like exception, event, level.

codefromthecrypt commented 8 years ago

What is the type signature? Would the input be Map?

Yes, but Map is an alternative signature, the primary one could be a variadic method, or a set of overloaded methods like of(...) in https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/ImmutableMap.java. A builder pattern can also be used. Any way we turn it, Java simply doesn't have an efficient way of passing variadic arguments without memory allocation, except for overloading the log method with 2, 4, 6, ... arguments.

Ah ok.. so this could well end up similar to how finagle works (depending on impl). For example, they have an "Record" type which encloses a few things needed in general, like traceId, timestamp, and annotation. The latter is what's extensible. For example Annotation could be ClientRecvError (error: String) In this case, the case class (no pun intended), is what's allowing the extensibility and the case class is also what allows for dispatch based on potentially multiple parameters.

Maybe we should show how the common dispatch cases might be implemented?

First, the dispatch does not need to happen in Java. If we are talking about tracers, then all they need to do is pack the set of key/values from the log method into a span and let the backend deal with interpretation. The example of dispatch (with some pseudo-code):

interface LogTagHandler { handle(Object value, LogData log); } Map,String, LogTagHandler> handlers = ImmutableMap.of( "level", new LevelLogTagHandler(), "payload", new PayloadLogTagHandler(), // etc ); Span log(Object... tagsAndValues) { LogData logData = new LogData(); for( tag, value in tagsAndValue) { handler = handlers.get(tag.toString()); if (handler != null) { handler.handle(value, logData); } else { logData.appendCustom(tag, value); } } }

I'd note that there's collaboration between keys in your earlier example (ex where log level is also present). This is what makes this sort of dispatch more tedious in pure java vs something like a scala case class.

But again, this only needs to happen way down the line in the collection pipeline. The actual tracer implementation, where performance matters the most, does not need any dispatching.

Very good point. What you are saying is that these events are placed into a queue or something and the reporting thread is the one that bears any overhead involved in parsing them etc (ex the calling thread only needs to dump it into a queue, which is practically overheadless)

yurishkuro commented 8 years ago

What you are saying is that these events are placed into a queue or something and the reporting thread is the one that bears any overhead involved in parsing them etc (ex the calling thread only needs to dump it into a queue, which is practically overheadless)

Actually, what I meant is that even reporting thread does not need to do any processing of the logging tags. Let's say we use Thrift for out of process span reporting, so I would define something like this:

struct Span {
  ...
  9: list<Log> logs
}

struct Log {
  1: int64 timestamp
  2: list<LogTag> tags
}

struct LogTag {
  1: string key
  2: string value
}

So the only processing that happens in-process and affects the end user performance is:

Once the span data is shipped out of process to the Collectors, only there do they need to start interpreting the tag keys.

I'd note that there's collaboration between keys in your earlier example (ex where log level is also present).

I would assume that what the Handlers do is simply populate a wider LogData struct where special tags are pulled out as top level fields, i.e. LevelTagHandler would set a level field on that wider struct. So in practice there's little interaction needed between the handlers.

bhs commented 8 years ago

I'm coming around to this proposal. I do think it "kicks the can up the API" per my previous note, but – even if that's true – there is value in making the lowest level of the logging API completely general.

I want to think through the implications for OpenTracing users who just want to do the simple thing (e.g., logging informational messages). Using Go as an example, I can imagine something like

import "github.com/opentracing/opentracing-go"
import "github.com/opentracing/opentracing-go/logging"

func foo(sp opentracing.Span, bar int64) {
    // All of the following are equivalent.
    //
    // Option 1: Platform-idiomatic formatted messages:
    logging.Infof(sp, "bar=%v", bar)

    // Option 2: Canonical logging keys:
    sp.LogKV(
        logging.MessageKey, fmt.Sprintf("bar=%v", bar),
        logging.LevelKey, 0,
    )

    // Option 3: Full control at the lowest level of the API:
    sp.LogKV(
        "message", fmt.Sprintf("bar=%v", bar),
        "level", 0,
    )
}

I can get enthusiastic about "option 1" above and the idea of logging.Infof (and probably similar things for events, exceptions, etc). IMO options 2 and 3 don't pass the "OpenTracing should be easy and lightweight" sniff test, though I see the value of those layers for the sake of generality.

Anyway, if we can agree that OpenTracing should try to offer things like option 1 above, I see the merits of making the Span API per se just pure key-value (and doing higher-level logging stuff in some other package / class / location).

bhs commented 8 years ago

Side note... @yurishkuro said

So the only processing that happens in-process and affects the end user performance is:

  • span.log(...) method needs to be called, usually requires 1 memory allocation for varargs
  • background thread packs the tags into the Log thrift struct without interpreting them

I like the idea of log payloads that can be arbitrarily large with the caveat that the tracing impl will not necessarily serialize all of them... i.e., the amount of payload that makes it downstream to the tracing system is best effort, perhaps beyond some sort of guaranteed minimum. In that case there is still key:value computation that needs to happen before the data gets out of the process, but I readily admit this is not something that is necessary, it's just something that can be done in order to make some attractive feature (like large structured logging userdata payloads) feasible.

codefromthecrypt commented 8 years ago

Thanks for clearing this up, basically this is moving most logic out-of-process ie to the collectors. In fact anything structural is only enforced after it leaves the process.

This is a variant of @mattheath's phosphor, except taking less responsibility in the api or transport form. In phosphor's case (Annotation), you can timestamp key/values in an annotation. However, there's certainly fields in the struct. This helps avoid the cost of serializing well-known keys and making everything subject to interpretation.

https://github.com/mondough/phosphor/blob/master/proto/trace.proto#L5

There are likely timestamped map shippers out there. Ex some general obs tools do things like this. However, I wonder if it isn't too generic, as you could indeed write a spec like this in any project. It is not more or less helpful here since it has basically no opinion except time, which is not that opinionated.

yurishkuro commented 8 years ago

@bensigelman I am not that much in favor of infof style of logging, primarily for these two reasons:

  1. not all platforms have a notion of "canonical" format, e.g. in Java there are all kinds of loggers that take {} or {n} or even %s style of formatting. I don't see how we can possibly agree on one of those.
  2. more importantly, I don't think infof style logging provides many benefits over structured logging. Practically any infof-style log message can be rewritten in the form (msg, k1=v1, k2=v2...), e.g. log("cache hit", obj_id=123, ttl=123). Structured logs can still be quite readable (cache hit: obj_id=123, ttl=123), but they are much better suited for machine searching and aggregations. By exposing only structured logging API we nudge the users to write better logs imho. Ironically, your 3 examples illustrate the issue perfectly - they would've been much better off written with bar being a tag instead of a formatted message.
bhs commented 8 years ago

@yurishkuro re infof-style logging: I don't have religious feelings about it, I was just using infof as an example of something that would make for awkward user code (without a wrapper function like logging.Infof(span, fmt, args...)).

On that note, for this particular (github) issue, I don't need to know whether we're going to support infof logging specifically... but I still want to advocate for convenience functions that help with the (standardized) structured logging key:values. E.g.,

import "github.com/opentracing/opentracing-go"
import "github.com/opentracing/opentracing-go/logging"

func foo(sp opentracing.Span, bar int64) {
    if err := something(...); err != nil {
        // Option A:
        logging.LogException(sp, err)

        // Option B:
        stackPCs := make([]uintptr, kMaxDepth)
        frameCount := runtime.Callers(1, stackPCs)
        stringFrames := make([]string, frameCount)
        for i := 0; i < frameCount; i++ {
            func := runtime.FuncForPC(stackPCs[i])
            stringFrames[i] = ... // format func into a string.
        }
        sp.LogKV(
            "error", err.String(),
            "stack_trace", stringFrames,
        )
    }
}

The "Option B" story gets even more unmanageable if we try to provide structure within each stack frame (e.g., separating the symbol name and func:line info).

Anyway, my point is not that we should necessarily support infof, but more that we should try to create platform-idiomatic wrappers for common logging use cases.

yurishkuro commented 8 years ago

@bensigelman I have no objections to util functions that eventually delegate to a single LogKV-style method. The only feature that is hard to support with such delegation is if we wanted infof-style util methods with deferred evaluation. If you're ok with util methods using fmt.Sprintf to resolve to a single message string, then we're good.

bhs commented 8 years ago

It's time to move this forward... the basic idea is straightforward, and really most of the interesting part is just deciding on the right idiomatic interface for each language.

I have three additional requirements to propose, though... discussion/dissent about these is welcome:

  1. There needs to be a version of the log method that accepts a timestamp (and not as a key:value pair, but as an explicit parameter)
  2. Similarly, there needs to be a way to bulk-create log items — each with their own timestamp — for systems that want to translate existing data (from the past) into OpenTracing API calls
  3. I would propose that we support the following value types for the key:value pairs:
    1. Basic numeric/bool/string types for sure (a la Span.SetTag)
    2. A nested key:value map with the same (recursive) restrictions
    3. A function/closure that returns any of the above (either a value or a map, that is)... the guidance would be that such functions are invoked by OT impls as late as possible, and ideally off of the critical path of the application

Part of me wants to wait on the functions/closures, but at least thinking about them might help us pick better typedefs and so forth.

I should also point out that 1. and 2. above are already handled by OT.

yurishkuro commented 8 years ago

@bensigelman

There's also one migration concern. We probably want the most generic k/v log method to be called Log(...interface{}), but we need to make sure it does not clash with the possible existing usage of Log(LogData) (e.g. by dropping the type LogData)

bhs commented 8 years ago

@yurishkuro

why 1? that is, why not via k/v?

My argument: the timestamp is a required argument and thus can be split out... it's just a required argument with a default of now() :)

is 2 means for something like []LogData to avoid locking on the mutex on each LogData?

Yeah, just a fast-path for translations. I guess "requirement" is the wrong word... we could always add it later since it's an optimization. But something to think about as we decide on types and so forth.

Is 3(ii) meant to capture JSON-like structures? If so, map isn't enough, JSON may contain arrays.

I could see it either way. If we support arrays, are they allowed at the "top level"? If so, how to distinguish between interleaved KV and an array?

There's also one migration concern. We probably want the most generic k/v log method to be called Log(...interface{}), but we need to make sure it does not clash with the possible existing usage of Log(LogData) (e.g. by dropping the type LogData)

I agree that there's a question here, but I think we can/should address Go stuff in the Go PR... for this issue I was hoping to focus on pan-language semantics.

bhs commented 8 years ago

Some notes from an in-real-life conversation with @yurishkuro:

bhs commented 8 years ago

People who care about this issue should check out https://github.com/opentracing/opentracing-go/pull/108

cwe1ss commented 8 years ago

I think that having only key:value pairs over-generalizes this. By only kind-of-formalizing common key names, trace implementations will not easily be able to build upon a minimum set of required information. If OpenTracing is used in multiple 3rd party libraries (e.g. web frameworks, ORMs, ...) they might all end up using different values - or even different data types - especially for level.

My main question is how you intend this feature to be used:

If so, filtering will be a very important requirement. As soon as you have many logs, you want to be able to filter them before they are sent away (to reduce traffic). This is very similar to the general "sampling" but on a per log basis. Imagine having multiple libraries using OpenTracing to log messages. Some of them might be very verbose, some not. You might not always want to get all messages from all libraries.

Formalizing on these common fields would make such global filtering easier:

I also think the following should be separate fields/parameters in the API:

Also, from an API usability point of view, having only key:value pairs brings a weird difference between creating regular spans and logging - e.g:

cwe1ss commented 8 years ago

fyi: this is how the new logging standard from ASP.NET Core (and many other .NET Core based libraries) defined its generic log method:

void Log<TState>(
  LogLevel logLevel, // a fixed enumeration
  EventId eventId, // a numeric identifier for the event
  TState state, // a payload object for structured logging - can have many fields/values!
  Exception exception, // loggers decides which fields to log etc.
  Func<TState, Exception, string> formatter // a method which creates a "readable" version of state - eg for console output
);

They then offer convenience extension methods, which are added to the existing ILogger type without having loggers to implement them - e.g.:

void LogInformation(EventId eventId, Exception exception, string message, params object[] args);
void LogInformation(EventId eventId, string message, params object[] args);
void LogInformation(string message, params object[] args);
bhs commented 8 years ago

@cwe1ss sorry to be brief, happy to elaborate...

The idea here is to settle on the lowest level of the API. It may be that we eventually support higher-level logging concepts built on top of this foundation, but I would like to keep the Span API out of the business of coming up with a "CISC"-style (sorry for the 90s processor analogy) logging portfolio that will please all parties... it would quickly eclipse the rest of the Span API and would be – IMO – a distraction.

We would definitely want to standardize on certain keys, much like we have with certain tags a la http://opentracing.io/data/

Does that set your mind at ease at all?

cwe1ss commented 8 years ago

yes, I definitely agree that logging is a completely different beast. That's why I think you have to be very careful with how you advertise this. It should be clear whether it should be used for "Order ABC processed for customer 123"-like log messages that would otherwise end up in ElasticSearch.

That's why I personally liked the "LogEvent"-naming. I understand this as a method that allows me to add tracing-relevant checkpoints to my span and nothing else.

Regarding the API, I think ease-of-use for users should be the first priority. Required fields (if there are such - like operationName for StartSpan) should be clearly visible as such.

cwe1ss commented 8 years ago

fyi - regarding key:value pairs: C# has a nice feature called "anonymous types" which would allow users to do something like this (which looks a bit nicer than creating a dictionary):

var tags = new { Level = "info", KeyXY = "SomeValue", Message = "SELECT * FROM Customers" };
span.Log(tags);

However, this has quite a few drawbacks:

So I don't think that it would be a very good fit for this scenario.

bhs commented 8 years ago

If there are no complaints by mid-day on Monday I will start dealing with the repercussions of these various API sketches/RFCs... Again, just to have everything in one place:

Go: https://github.com/opentracing/opentracing-go/pull/108 Python: https://github.com/opentracing/opentracing-python/pull/36 Java: https://github.com/opentracing/opentracing-java/pull/47 Javascript: https://github.com/opentracing/opentracing-javascript/pull/53

(I don't think anyone but me cares about Objective-C :)

dawallin commented 8 years ago

My largest concern with this change is that it is to generic and therefor, as @bensigelman said earlier

...weakens the "instrument once" Tracer interchangeability story.

To me the simplest form of logging contains a message string and a timestamp. There is a discussion why the timestamp should be a separate parameter, and I would prefer to have the message treated the same way. To this we could then add metadata for semantic logging, loglevel etc., in one generic form or another. The purpose of this would be that even if there are different ways of adding extra payload, at least a message string and a timestamp could be interpret by all tracer implementation.

With that said, I like the way the previous payload parameter now is formalized into a key-value array.

yurishkuro commented 8 years ago

To me the simplest form of logging contains a message string and a timestamp.

I agree that it's the simplest, but I disagree that it's the most useful. In large scale distributed systems, logs are not written for humans to read, they are primary meant for machine processing. If the only thing that you have in the log is a scalar string message, the KV API allows you to log it just as easily, as log(event='your scalar string') (Note the use of event instead of msg - a scalar string can only mean an event at a point in time). In most cases people want to log not just events, but some state, and in the traditional logging API the only way to log a state is with formatted strings like log("Found something=%s", xyz), which is easy to read but hard to machine-process. An equivalent KV format log({"event": "found", "something": xyz}) is not much harder to read, but significantly easier to machine-process, aggregate, alert on, etc.

The KV API we're proposing does not preclude a "simple message with a timestamp", and does not even make it significantly harder to log, but by being more general it allows a lot more expressibility for state logging.

yurishkuro commented 8 years ago

Cf: https://kartar.net/2015/12/structured-logging/

yurishkuro commented 8 years ago

Or this: http://dev.splunk.com/view/logging-best-practices/SP-CAAADP6

bhs commented 8 years ago

This is done for Java, Go, Python, Javascript, and ObjC... C# and C++ are not stable yet and thus shouldn't block this issue. Thanks all, and esp @yurishkuro for making the suggestion in the first place.