openzipkin / zipkin

Zipkin is a distributed tracing system
https://zipkin.io/
Apache License 2.0
17k stars 3.09k forks source link

Zipkin v2 span model #939

Closed codefromthecrypt closed 4 years ago

codefromthecrypt commented 8 years ago

We should not try to do all things as zipkin v2.. the large amount of interesting things may lead us to never even start. I believe that if we pare down do basics, creating a model that only includes features that work in zipkin 1 backends, we can start and finish something.

For example, zipkin 1 only really supports String -> String binary annotations.. let's make them String -> String for v2 even if they become fancy later.

Zipkin v1 allows "contributory" spans, where instrumentation are allowed to place multiple hosts into the same span. Let's prevent that, and use a portability technique google cloud trace does. Google Cloud Trace allows multiple spans with the same id, but they vary on span.kind. For example, core server annotations really imply a span.kind=server, where "sr" is span.startTs and "ss" is span.endTs.

We could probably literally use the google cloud trace model as zipkin v2 as it is almost exactly what we need.

https://cloud.google.com/trace/api/reference/rest/v1/projects.traces#TraceSpan

If we do something like this, I'm confident we can get v2 out in relatively short order, and still not prevent more interesting (and severe) change from occurring in later versions of zipkin.

codefromthecrypt commented 8 years ago

here's an issue where binary annotations were a bit confusing (as they allow repeats on keys) https://github.com/openzipkin/brave/issues/136

yurishkuro commented 8 years ago

here's an issue where binary annotations were a bit confusing (as they allow repeats on keys)

is it fair to say that ^^^ is only confusing because the UI does not associate binary annotation with the "span kind" that contains them? For example, I made a fix that adds service name to SA/CA annotations

image

Another common scenario I've seen with same keys is the http.url, which is sent by both client and server, and the value may or may not be the same.

If we adopt the suggestion that (spanID, spanKind, emittingService) combination is unique (which I think implies ditching core annotations and moving the Endpoint on the Span itself), then seemingly identical binary annotations may still be distinguishable by the service that emits them.

codefromthecrypt commented 8 years ago

The above concern on repeating keys is more comprehensively reviewed here: https://github.com/openzipkin/zipkin/issues/989#issuecomment-198825497

If the span included a copy of the endpoint, it would be possible to support this, though let's bear in mind this isn't a popular feature.

codefromthecrypt commented 8 years ago

When we get to actually implementing this, we should be very explicit about the safety any issues that remain.

For example, even if a span is single endpoint, do we accept updates to it? If we do, there's a chance of high-maintenance concerns such as a client sending different names for the same span.

codefromthecrypt commented 7 years ago

Here's my take of the zipkin v2 model. Simplified, focused on single-host spans, not including any new features, removing troublesome or unsupported features. Zipkin v3 could introduce features not present in our storage or rest api, but I believe it best to pare down first.

Here's an example marked up in java.

class Span {
  class TraceId {
    long hi;
    long lo;
  }

  enum Kind {
    CLIENT,
    SERVER,
    ONE_WAY
  }

  class Annotation {
    long timestampMicros;
    String value;
  }

  @Nullable Kind kind;

  TraceId traceId;
  @Nullable Long parentId;
  long id;

  String name;

  long timestampMicros;
  long durationMicros;

  Endpoint localService;
  @Nullable Endpoint peerService;

  List<Annotation> annotations;
  Map<String, String> tags;

  boolean debug;
}
codefromthecrypt commented 7 years ago

@openzipkin/cassandra @openzipkin/elasticsearch @openzipkin/instrumentation-owners @openzipkin/core finally getting around to this. As mentioned in the previous comment, I don't think it is useful to try and make the model more fancy, rather the pare down to the smallest to support our most used features. Later models can be more fancy after we stabilize.

The model above can support our existing rest api, as well be heuristically derived from existing spans. For example, similar heuristics are already in place to export our current zipkin spans into google stackdriver trace, which as mentioned earlier has a similar single-host model.

If reactions to the proposal I put are mostly positive, then I can bump out an issue to sort details and any impact to storage, translation, propagation etc. This would be similar to the 128-bit thing where it would likely take a couple months to sort everything out.

basvanbeek commented 7 years ago

@adriancole not seeing the operation name as part of the proposed v2 span. Do you want to drop this in favor of consumers utilizing tags for this?

codefromthecrypt commented 7 years ago

@basvanbeek uhh.. click refresh :)

nicmunroe commented 7 years ago

I like this model. Clear, concise, easy to understand what everything is and what it's for.

jcarres-mdsol commented 7 years ago

how do you know if you should sample this trace? Would make sense for tags to have timestamp?

mansu commented 7 years ago

Does it make sense to merge annotations and tags into a single structure? I feel tags should also be a list.

codefromthecrypt commented 7 years ago

how do you know if you should sample this trace?

The current model represents what is reported. Only sampled traces are reported.

Would make sense for tags to have timestamp?

tags currently don't have a timestamp, so adding a timestamp would actually be a feature addition. I'd be quite hesitant in doing that as it has storage impact and is also more complex than a dict

On Thu, Dec 1, 2016 at 7:45 AM, Jordi Polo Carres notifications@github.com wrote:

how do you know if you should sample this trace? Would make sense for tags to have timestamp?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzipkin/zipkin/issues/939#issuecomment-263989900, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD61xZCd2vLM1YtjXKwqKEbo5gfw5YIks5rDeBUgaJpZM4HS1vF .

codefromthecrypt commented 7 years ago

Does it make sense to merge annotations and tags into a single structure? I feel tags should also be a list.

tags used in search queries and since there's only one host in this data structure, there's no need to have a list (which would allow duplicates). No one should report multiple values for the same tag for the same span on the same host.

dict is a very easy data structure to work with in any language.

annotations remain timestamped. Due to that, they need to be a list as you can have multiple events logged at the same time.

codefromthecrypt commented 7 years ago

Examples will help a lot. here's a literal translation of an existing client span into the proposed simplified format. Note: on subtle change is suffixing micros to our units. This helps with things like logstash etc which expect timestamps to be milllis or otherwise (meaning you can add a normal timestamp without clashing!)

current

{
  "traceId": "5af7183fb1d4cf5f",
  "name": "query",
  "id": "352bff9a74ca9ad2",
  "parentId": "6b221d5bc9e6496c",
  "timestamp": 1461750040359000,
  "duration": 5000,
  "annotations": [
    {
      "endpoint": {
        "serviceName": "zipkin-server",
        "ipv4": "172.19.0.3",
        "port": 9411
      },
      "timestamp": 1461750040359000,
      "value": "cs"
    },
    {
      "endpoint": {
        "serviceName": "zipkin-server",
        "ipv4": "172.19.0.3",
        "port": 9411
      },
      "timestamp": 1461750040364000,
      "value": "cr"
    }
  ],
  "binaryAnnotations": [
    {
      "key": "jdbc.query",
      "value": "select distinct `zipkin_spans`.`trace_id` from `zipkin_spans` join `zipkin_annotations` on (`zipkin_spans`.`trace_id` = `zipkin_annotations`.`trace_id` and `zipkin_spans`.`id` = `zipkin_annotations`.`span_id`) where (`zipkin_annotations`.`endpoint_service_name` = ? and `zipkin_spans`.`start_ts` between ? and ?) order by `zipkin_spans`.`start_ts` desc limit ?",
      "endpoint": {
        "serviceName": "zipkin-server",
        "ipv4": "172.19.0.3",
        "port": 9411
      }
    },
    {
      "key": "sa",
      "value": true,
      "endpoint": {
        "serviceName": "mysql",
        "ipv4": "172.19.0.2",
        "port": 3306
      }
    }
  ]
}

simplified

{
  "kind": "CLIENT",
  "traceId": "5af7183fb1d4cf5f",
  "parentId": "6b221d5bc9e6496c",
  "id": "352bff9a74ca9ad2",
  "name": "query",
  "timestampMicros": 1461750040359000,
  "durationMicros": 5000,
  "localEndpoint": {
    "serviceName": "zipkin-server",
    "ipv4": "172.19.0.3",
    "port": 9411
  },
  "remoteEndpoint": {
    "serviceName": "mysql",
    "ipv4": "172.19.0.2",
    "port": 3306
  },
  "tags": {
    "jdbc.query": "select distinct `zipkin_spans`.`trace_id` from `zipkin_spans` join `zipkin_annotations` on (`zipkin_spans`.`trace_id` = `zipkin_annotations`.`trace_id` and `zipkin_spans`.`id` = `zipkin_annotations`.`span_id`) where (`zipkin_annotations`.`endpoint_service_name` = ? and `zipkin_spans`.`start_ts` between ? and ?) order by `zipkin_spans`.`start_ts` desc limit ?"
  }
}
codefromthecrypt commented 7 years ago

Hopefully it is visible in the example that bookend annotations "cs" "cr" are not needed in the simplified model, yet have the same signal (as you can tell by start, duration, type)

codefromthecrypt commented 7 years ago

ps details, but flexible with how the address annotations are mapped to top level.

ex. localService vs localEndpoint vs endpoint

bhs commented 7 years ago

@adriancole I'm not sure what sort of appetite there is for breaking changes here, but ignoring compatibility, I would suggest...

codefromthecrypt commented 7 years ago

@adriancole https://github.com/adriancole I'm not sure what sort of appetite there is for breaking changes here, but ignoring compatibility, I would suggest...

Yeah I had mentioned that this would be a simplification of the current model, not changing or removing features like debug.

Basically, optimizing for single-host spans, something that can be mechanically done in the reporting tier without creating a new envelope type etc.

I'd like to defer actual changes to a model 3 as there has been no activity for a year due to breaking scope. there's room for benefit now in other words.

codefromthecrypt commented 7 years ago

To put things in perspective, the biggest change needed for instrumentation with the proposal is to not share span ids when propagating. I don't want to burden volunteers with unrelated things like adding sample algos etc. we can always do that later.

This change doesn't require affecting B3, and wouldn't break use of debug which some use today including the browser plugins. It doesn't require new data to be captured or presented differently. It doesn't require us to reshuffle logic or already exposed classes that currently use our endpoint type.

Basically the goal is to pare down the model to the features actually supported, which makes it something possible to deliver in a couple months, reducing cognitive and code complexity without introducing much work for the volunteers who are impacted by the changes. It is affordable and positive, basically.

devinsba commented 7 years ago

I think it would be useful to see what a full V1 trace would look like in this simplified format. The client one you posted above looks good to me. It simplifies a bunch of things I've had gripes with. (Endpoint on binary annotation)

codefromthecrypt commented 7 years ago

TL;DR; I've been thinking about how we don't break v3 in process of doing v2. If we change the tags to permit storage of string, number, bool values (as opposed to rejecting them), then the model can upgrade into a more featureful v3 system, particularly with advanced query support.


I think there's definitely a need to keep a v3 in mind when doing v2. How we can eventually support things on our backlog such as analysis or linked traces is important. It is equally important that we can "upgrade into" future features (ie usually by adding fields, and at worst removing some). Being future conscious without paralysis or over-engineering today is very tough balance.

I've know many full-bore systems keep mildly-typed tag values, for filter aggregation or analysis reasons. For example, AWS X-Ray and google's trace proto have a 3-type value of String bool or number. These can be mapped into json without qualifiers (even though there are some limitations like quoting big numbers). As long as there's only 3 types, we can keep the tags map simple. Basic type is crucial as this allows tags to be a dict vs a list of tuples which is more arduous to map to standard types.

For example, eventhough we can't currently query on them, we can still permit basic type tags like so, notably allowing a number.

  "tags": {
    "jdbc.result_count": 43,
    "jdbc.query": "select distinct `zipkin_spans`.`trace_id` from `zipkin_spans` join `zipkin_annotations` on (`zipkin_spans`.`trace_id` = `zipkin_annotations`.`trace_id` and `zipkin_spans`.`id` = `zipkin_annotations`.`span_id`) where (`zipkin_annotations`.`endpoint_service_name` = ? and `zipkin_spans`.`start_ts` between ? and ?) order by `zipkin_spans`.`start_ts` desc limit ?"
  }

Another analysis concern is not having to parse the annotations to figure out what they are. One way to accomplish that is to only send a map. However, this is difficult to reverse engineer a user-level description out of, and the lack of that leads to very poor zipkin experience of what I call "json pooping". Ex you have a message that is just a json blob. One neat thing I noticed about google's trace proto is that it supports both. For example, the annotation still has a primary description field.. and you can later add classification tags for use in filtering or whatnot. In other words, the feature is something you can "upgrade into" because the annotation type just gets an "extra" field of tags (as opposed to having its primary field replaced by tags)

Ex. In both v1 and v2, annotations look like this (notably v2 excludes the implicit endpoint):

{"timestamp":1478624611519000,"value":"mr"}

in V3, we could add or parse classifying tags to it, without breaking existing code.

{"timestamp":1478624611519000,"value":"mr", "tags": { "type": "messaging", "direction": "received"}}
codefromthecrypt commented 7 years ago

another update, I was working through brave4 and noticed all the conversion stuff can be done mechanically. This means reporters and collectors could easily be changed to support the simplified form now, even if storage query etc aren't updated for a while. In other words, we can do "version 2" now, without breaking api.

The only thing it requires is that instrumentation who use the new format don't share span ids across a network boundary. I think starting with instrumentation (well technically codec) is a nice way to ease in the change in a way that is in no way a big bang.

Thoughts?

codefromthecrypt commented 7 years ago

ps I noted a while ago that stackdriver had a heuristic for splitting client+server spans into single host spans. That's here. Uber also have a heuristic for the same task here.

When we get to the point of implementing v2 model, it will likely be the case that storage isn't upgraded, but the UI is. In such a case we can use a splitter similar to one of the above to represent traces in the new format even if the "raw span" is still the old model.

codefromthecrypt commented 7 years ago

The mythical single-host span

From a trace instrumentation point of view, spans have always been single-host. That's because the host of the span is the tracer, and the tracer only reports its spans. With that point of view in mind, being required to add endpoint to all spans feels laborious and inefficient. This has led to people asking to change the model. I've gone down this path, too, though in the process have grown an appreciation for the existing model. Here's why:

Where to go from here?

It seems most sensible to formalize a reporting representation even if the real model needs to address multi-host for a while. The reporting representation can be translated into the old one, and may eventually become its own model. In the mean time, there's a backlog of work needed to really support single-host spans. This includes making the UI detect RPCs that span different IDs, backfilling tests and logic for clock skew correction and dependency linking, and probably other things. Once the whole system works with single-host spans, we could then consider formalizing it as the "one model", and switch storage etc accordingly.

jcarres-mdsol commented 7 years ago

Is V2 API dependent on solving single host spans?

codefromthecrypt commented 7 years ago

@jcarres-mdsol we can make a V2 api with any type of representation (ex if you are thinking about polishing things about the api, such as query parameters etc). This issue is more about the model.

We can add a v2 POST/PUT endpoint which has single-host json model/representation, and that could be used by instrumentation immediately (since they don't report timing information from multiple hosts anyway). The implementation would simply translate it to the existing model before sinking to storage.

jcarres-mdsol commented 7 years ago

I think I'm still confused on this one, is single host spans a better representation than shared spans among services?

codefromthecrypt commented 7 years ago

@jcarres-mdsol

I think I'm still confused on this one, is single host spans a better representation than shared spans among services?

:) you asked it it is better representation. It is a better representation when you primarily deal with one host, which is the case with tracers.

ex. when we write a client tracing code, we might now do the following

setType(client)
start()
addTag("http.path", "/foo")
setRemoteService(remoteService)
finish()

to map that to the existing span model, it becomes

addAnnotation(startTimestamp, "cs", localService)
addAnnotation(endTimestamp, "cr", localService)
addBinaryAnnotation("http.path", "/foo", localService)
addBinaryAnnotation("sa", true, remoteService)

it is very awkward to explain ^^ in docs..

using the span model proposed it becomes

setType(client)
setLocalService(serviceOfTracer)
setRemoteService(remoteService)
setTimestamp(startTimestamp)
setDuration(endTimestamp - startTimestamp)
addTag("http.path", "/foo")

I think the better is a better representation for tracers to use because...

All these things feed into easier understanding and docs when writing tracers. That means less support time, and people setup for success as opposed to confusion. We'll have less people with bugs (ex where they leave out binaryannotation.host which happens), because such bugs are now impossible.

Brown field tracers do not need to change. People who already learned the existing format can run with it. The winners here are new users and people who support zipkin. We make it easier for new tracers to get started. We offer them a simplified upload format, devoid of nuanced traps or cognitive burden.

The cost of doing this is a parser on the server-side. This is primarily collectors (ex we can say only http and kafka) which accept the simplified format. In the OpenApi/swagger spec, for example, we'd define an alternate supported representation, over the same endpoint or over a different one. It can coexist and I think it is worth the work if the only beneficiary was documentation and new users.

codefromthecrypt commented 7 years ago

note to all in the thread :)

this issue was created because zipkin v2 never happened as it had too much scope. This issue initially discussed tackling the complexity in the model. It is still that.

As this issue nears a year in duration with no activity beyond discussion, I've suggested we pare it down further:

Allow a simpler representation format for tracers. Accept this format and convert to the existing one on collection.

I think the V2 word is scarier than it needs to be. This is a feature, just like a propagation format would be for tracers. The party with the most responsibility in this feature is those who maintain Zipkin's docs and collector code. Not doing this means we continue to punt known complexity problem further down the road.

If folks feel strongly we shouldn't do this, those folks need to roll up sleeves to help when questions and concerns happen during onboarding. Ex contributing to docs, gitter, and issue resolution when people get confused about what a binary annotation is and send the wrong data. If we prefer this route, please say so and help me with the burden of today.

jcarres-mdsol commented 7 years ago

I am all for this to happen. I think I was confused by all this shared spans vs single-host, etc. We are talking about simplifying the json sent to the http API. Then the earlier the better.

codefromthecrypt commented 7 years ago

I am all for this to happen. I think I was confused by all this shared spans vs single-host, etc.

good point, this became clear to me too, that choosing one or the other is polarizing and wasn't going to result in a tactical gain. I think I failed to make that obvious before.

We are talking about simplifying the json sent to the http API. Then the earlier the better.

great. I'll start carving something out.

one thing I noticed in practice is that it is more flexible and consistent to record a finish with a timestamp vs a duration. (especially as there's no granularity loss as both are microseconds). I'll open a new issue with the proposed format (if for no other reason than to make it easier to read :))

jcarres-mdsol commented 7 years ago

Yes, please, current discussion will be confusing to anyone. I'm very looking forward a new format

codefromthecrypt commented 7 years ago

https://github.com/openzipkin/zipkin/issues/1499

nicmunroe commented 7 years ago

FWIW I'm personally in favor of the simplified single-host spans. Anything that makes it more difficult for instrumentors to accidentally introduce bugs due to confusion/complexity/etc is a good thing. And being simpler improves barrier-to-entry and learning curve issues.

codefromthecrypt commented 7 years ago

A reminder that things are easier if we don't do mixed types for tags (ex map has value of type string or bool). Not only does this not work in elasticsearch mappings (first type wins), but also it is complex in json schema https://github.com/openzipkin/zipkin-api/issues/27

jorgheymans commented 4 years ago

Closing for the same reason as https://github.com/openzipkin/zipkin/issues/1499#issuecomment-631732359 . This is an issue thread about the design of zipkin v2 model, which has been released long since.