Closed tedsuo closed 4 years ago
I think a problem that we will run into with this is that the surface for a span is not fully defined by the interfaces that OT provides. Because of this there might be key data to a specific tracer that does not translate into the data that is defined in the OT span, making it nearly impossible to work with data from that tracer in a generic manner
Just to bring some feedback: I heard about this more than once during CloudNativeCon. Even pcap
has been used as an example on why that's needed, in the sense that there are tools specialized in reading and making sense of pcap
, no matter which tool generated it.
In any case, I like the JSON idea, but I'd rather either have a valid JSON array, or a valid JSON stream format with line delimiters (which seems to be the common case in other stacks):
https://en.wikipedia.org/wiki/JSON_Streaming
A JSON array has the obvious disadvantage of requiring the final ]
, which might not always be convenient to add.
@devinsba I think it's reasonable to reserve a keyspace for tracer specific data. The key thing is to have the information as entered via the OT API to be standardized in the exchange format.
In other words, since the API for OT is standardized; there should be a way for tools to inject or "replay" OT information as entered by an application. To that extent, I don't think there's anything missing or tracer-dependent.
@jpkrohling for JSON, the streaming standard is "comma separated objects", where the bounding array is implicit. So, something as simple as JSON.parse("["+data+"]")
would work.
I don't want to just fall into JSON as a solution just because it's easy. For example, JSON is not particularly efficient to parse. But it sets the bar to a certain level.
the most critical part of tracing data is the causal relationships between different spans. They are expressed via span references, which are abstract in the OT APIs, leaving tracers to implement things like span/trace identity as they wish. Without that piece, the only non-controversial parts of the spans are the operation name, tags, and logs (duration is technically also implementation specific due to varied precisions).
Mmmmmm, okay I see that now.
Cutting scope even more, rather than a universal interchange format, adding a utility to all of the basic tracers that outputs in the same format would be sufficient for most of the use cases I've seen so far. From a clutter perspective, I would prefer the basic tracers add a logging facility over having a separate pile of "logtracers".
+100 on opening this issue. Some essential aspects of trace state are not defined clearly by the api. Naming conventions such as traceId, spanId will differ among tracers.
@tedsuo creating a utility which correlates the trace state of a given tracer to a standardized naming scheme/format would greatly help.
@tedsuo if the issue is only about basic tracers, then this repo is not the best place for it, since it's a discussion of a particular implementation, not specification.
There is a proposals for a standard Trace-Context
HTTP header, which indirectly implies a standard for trace & span identity: https://github.com/TraceContext/tracecontext-spec/pull/1/files
If the trace/span identity issue is addressed, then the common JSON OpenTracing format becomes very possible.
Ok thanks, I see that. A common JSON format for exchanging traces is what this issue was originally created to address. I would like to pursue that as long as there is hope that it can be attained. I would hold off on adding anything to basic tracers as long as the identity issue is actively being investigated.
Also, from the point of view of establishing the keys for a JSON Span object, are their any fields other than span_id
, trace_id
and meta
needed for identity?
{
"span_id": undefined,
"trace_id": undefined,
"meta": undefined,
"name": "",
"timestamp": 0,
"duration": 0,
"tags": {"": [{}]},
"logs": [{}],
"baggage": {"": [{}]},
}
Even if the values of span_id, trace_id, and meta were only consistent within a tracer implementation, something like this would still be a very useful level of standardization.
childOf
and followsFrom
would be needed to establish relationships. I am unfamiliar with a meta
field, what is this used for? Possibly tags and baggage should simply be a dictionary of k,v ({}
)?
Baggage may be used to propagate encrypted information which should not be written in a textual format.
Something like:
{
"span_id": undefined,
"trace_id": undefined,
"meta": undefined,
"childof": "",
"followsfrom": "",
"name": "",
"timestamp": 0,
"duration": 0,
"tags": {},
"logs": [{}],
"baggage": undefined,
}
@tedsuo , this is the format in my open-source tracer implementation: https://github.com/wu-sheng/sky-walking/wiki/3.0-Collector-TraceSegment-in-JSON-format .
We use more fields for making the analysis easier. e.g. Topological graph of application clusters
And, we use a total different data structure to propagate in my commercial product, which is NO json and NO key, for higher performance in network.
After all, my implementation details are not very important. My point is Standard Interchange Format
is nearly impossible for different requirement. Especially in spec level.
IMO, the spec should focus on the tracer-API-user level. Backend is much more complex, trace data can give the APM more information than only a Span-DAG(tree).
I tend to agree with @wu-sheng. Without expanding the standard, I would argue that, if you're going after testing it would be sufficient to have a "MockTracer" available in each language where the tracer's state can be queried before and after the tests are run. That mock tracer could certainly output JSON if desired. For integration testing it would be extremely helpful (but not strictly required) if the backend servers offered a consistent method of querying what trace data has been received.
I had the same idea about BasicTracers. They should use the same data and wire format. Tracing systems would be able to consume that format and declare themselves as basic tracer compatible.
Assuming the Trace-Context
gets added to the spec then the most data you could guarantee as things stand right now would be:
{
"trace_context":"",
"name":"",
"tags":[],
"logs":[]
}
Trace-Context
is an attempt for zipkin team. It is a good way for every tracer implementation team. But in specification level, I highly doubt that.
And agree that we can set a data structure format for MockTracer
. In that way, we can discuss a JSON-like( or JSON-stream ) format, because it is only for mock/test or demonstration.
Can we do that? If so, we can start a new issue, since this issue title and context are not right for that.
I originally had meta
as a place for tracers to stick any additional tracer-specific information, but I'm dropping it.
I believe the following keyspace should work:
{
"trace_id": undefined, // values for trace_id and span_id are not currently standardized
"span_id": undefined,
"references": [
{"type":"child_of","span_id":undefined}, // currently types are "child_of" and "follows_from"
],
"name": "operation type",
"timestamp": 1234567890123456789, // UTC Epoch Unix Timestamp in Microseconds
"duration": 300, // integer representing microseconds
"tags": {
"tag_key": [{"a":"b"},{"c":"d"}], // can tags have multiple values in some systems?
},
"logs": [
{"timestamp":1234567890123456789, "a":"b"}, // logs optionally have a timestamp
],
"baggage": {"a":"b"},
}
Again, to clarify the purpose of this exercise: it should be possible to have a data format that captures what a user inputs via the OT API, which in turn allows another system to replay that input without having been present during instrumentation. This is why values such as trace_id
and span_id
are left undefined: the user does not input this information. Other values, such as timestamp and duration, count as implicit input.
I agree that tracers may collect information above and beyond what is present in the OT API, but to the degree to which the OT API is standard across languages, it should be possible to represent it in a loggable format.
Also, baggage is meant to propagate with the span context across services, but there's no requirement for baggage to be captured in the span, so I don't think it needs to be in the JSON
(I'm new to OT) I target the same goal (simple local tracer) with the impl (spec) of logger tracer in https://github.com/opentracing-contrib/java-span-reporter. So if a "base" format emerge, I'll use it. today the current impl report json in log:
19:48:36.841 [main] TRACE tracer - {"ts":1490982516841,"elapsed":13,"spanId":"c4bf987f-2d0c-49f1-87bd-a23930570b96","action":"finish","operation":"span-2","tags":{"description":"the second inner span in the original process"},"baggage":{},"references":{"child_of":"da7d843a-4564-4480-a6cb-19550c24a344"}}
@yurishkuro,
@davidB trace_id
is not part of OT, because it is based on tracer implementation codes. Basically, it is usually an encode string, which generate by some rules or random.
if tags allows duplicate key (or multi value string), it is not clear and could generate lot of confusion.
The less a man makes declarative statements the less apt he is to look foolish in retrospect. Quentin Tarrentino, From the film 'Four Rooms'
In other words, don't say that it's a map / dictionary, especially a multi-map, and there is no confusion. In our Thrift data model we define tags as a list
10: optional list<Tag> tags
None of my users have ever been confused by that, nor was it an issue for the backend.
trace_id is not part of OT, or I miss it ?
Correct, but as I mentioned in the earlier comments, (a) you cannot represent SpanReference without it, and (b) as opaque string values it is not so far fetched concept, and the ongoing Trace-Context
standardization effort even gives it a more concrete shape.
the API is setTag (or withTag), not addTag:
Span.setTag(tag, value)
I expect that at end of
span.setTag("tagA", "v0")
span.setTag("tagA", "v1")
the value of tagA to be "v1", not "v0" or ["v0", "v1"].
With
[
{"tagA", "v0"},
{"tagA", "v1"}
]
I don't know how to interpret/find the tagA.
This is not a matter of the API, but of the data model. I have run an internal poll among our JS developers and they were strongly in favor of using arrays of values instead of dictionaries when developing a JSON API for traces. And with logs the API (e.g. in Go) does allow multiple fields with the same key.
In terms of quality of data, if you simply need to search for spans by a tag, then it doesn't matter if the span has one or more values with the same key. If you want to infer certain information about the span based on a tag, say "http.url", then you already have the same problem because even though the API is called SetTag (as in "set once"), instrumentation can still call it multiple times, so how do you know whether the last value is better than the first? I'd argue it's a bug in the implementation and it should only set meaningful tags once - but that's not a question of the backend data model anymore.
In short, I prefer the data model to be lossless, rather than forcing some decisions that can be deferred.
@yurishkuro It is interesting that the golang API implementation has a function setTag(key String, value interface{}) essentially allowing for any type of tag. On the other hand, the Java API is more restrictive by only allowing for String, Boolean, and Number as datatypes. I agree that providing the datatype in the tag would help parsing at the other end.
On the other hand, shouldn't language implementations follow the spec? The spec defines tags as a mapping of k,v pairs which could be taken to mean that tags are a hashmap or tracing libraries may have a List of tags. Maybe the wording in the spec could be more specific?
Nevertheless, if we standardize an interchange format in this issue, it wouldn't be difficult for each language-specific library to provide a utility to output this format.
You did a poll, so it's a symptom of ambiguity. ambiguity (or deferred) response lead to variation in the implementation and incompatibility. I'm fine with any solution (last value, first value, multi-value), but not to have ambiguity.
setTag("debug", true)
setTag("debug", false)
Today, the main issue I have with OT, is that only setter and no getter so:
So if tags is a append only list, it should be in OT, and reflected in the interchange format.
(meta: I wish I had more time and have been meaning to say something on this thread)
In general, my biggest question is about the goal(s) for the work described in this issue. I have always been dissatisfied by the imprecision / colloquialism of the "formal" OpenTracing spec.md, and part of the implicit motivation here seems to be a more explicit definition of the data model.
That said, the specification is really describing a bunch of state transitions... if we wanted to make something more formal, it would need different "structs" for each method call. But it's problematic to define a single data model for the Span itself since it's completely valid (in fact, recently encouraged) to build Tracer implementations that intentionally drop some of the data: e.g., the discussions about taking the Tracer and sending a summary of tagged latencies to a metrics subsystem like prometheus or statsd.
One other motivation might be to have a simplified mental model of the OT data model that might be slightly imprecise but easier to grok than a multi-page specification document. That's a fine thing to want, and arguably the BasicTracer RawSpan
structs provide just that sort of approximate mental model.
If we're really talking about what this issue wants, by name, I am concerned because I don't understand the actual use cases for such an interchange format yet, and I would want the format to be driven by those use cases. The most compelling thing I am aware of would be an integration point for "simple" / straightforward pieces of infrastructure like linkerd, Envoy, haproxy, nginx, etc: they are awkward to integrate with from an instrumentation standpoint, but very desirable to integrate from an operational standpoint since they have a lot of visibility and a common surface area across large distributed systems. True interop between, say, Zipkin and another tracing system seems way further-fetched but would have its own complex requirements.
Anyway, TL;DR: what are we designing for here? A more formal spec? An informal mental model? Linkerd and friends? Something else?
fyi: The mentioned Trace-Context project has recently been moved and is now a W3C project: https://github.com/w3c/distributed-tracing
@cwe1ss Just for clear, at recent days, Trace-Context is talking about the propagation format, rather than Span or Trace format.
@tedsuo I realise it has been a while, but where did your thinking land? I would like to report spans as json for the easy consumption by a JS browser app (a different goal to the ones you listed). Your proposal, with @yurishkuro’s feedback, represents what I want. Losing the fidelity of a span is acceptable for my use case. Thanks for raising this issue.
As a follow-up, here's a sample of my implementation inspired by this thread:
{
"baggage": [],
"duration": 11726,
"logs": [
{
"fields": {},
"message": "hello-world",
"time": 0
}
],
"operationName": "some-span",
"references": [],
"spanId": 4866369132225493238,
"start": 1551079498409000,
"tags": {
"sampler.type": "const",
"sampler.param": "true"
},
"traceId": 4866369132225493238
}
Any movement on this?
There is not going to be a resolution to this thread given that most development efforts have moved to OpenTelemetry, which does define a canonical data format https://github.com/open-telemetry/opentelemetry-proto/.
@yurishkuro thanks!
There is some debate of whether there should be standard or "official" OpenTracing wire format. Sidestepping that debate, a smaller and hopefully less contentious starting point would be an out-of-band format for sharing trace data.
This has come up repeatedly in the following areas:
Ideal qualities for this interchange format:
As a straw-man to kick it off, I propose a comma separated list of JSON-formatted spans.
What say you?