open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.71k stars 886 forks source link

Add BeforeEnd to have a callback where the span is still writeable #1089

Closed gugahoa closed 1 month ago

gugahoa commented 3 years ago

What are you trying to achieve? Set an attribute to all spans in a trace

What did you expect to see? Writeable spans on SpanProcessor's OnEnd(Span)

Additional context. What I tried to implement is a feature called trace field in Honeycomb (reference: AddFieldToTrace)

For that I first tried to use SpanProcessor, but soon found out that OnEnd(Span) is not writeable.

The solution I arrived was to write a custom exporter which was able to read from a thread-local storage (at the time of the implementation, Baggage had yet to be spec'ed out) to list all attributes related to a Trace and insert into the span before exporting.

From what I understood from the SpanProcessor specification, that's the place where I should have placed this logic, as it's a transformation that should happen before the span is sent to any exporter

tsloughter commented 3 years ago

/cc @arminru

garthk commented 3 years ago

Required for dynamic resource-like attributes desired in every span but computed at run-time because they're not constant, eg. instance uptime and recent CPU utilisation. Injecting them in the exporter gives results that become less accurate during the spikes we're most interested in.

arminru commented 3 years ago

IIRC it was discussed to make spans writeable in OnEnd but this was left out because users will not always have full control over the order in which span processors are registered and in turn invoked, which could result in unreliable behavior. From your use case it sounds like you would not need any attributes that are added/overwritten in OnEnd visible to any other span processor, right? If you only need to have those changes reflected once passed to exporters, this shouldn't be an issue then.

@carlosalberto you seemed to remember some of that discussion as well in the meeting yesterday. Do you know the rationale for that decision?

iNikem commented 3 years ago

Why these attributes cannot be set in onStart, which already has writable span?

arminru commented 3 years ago

Reading the spec again, I noticed that it explicitly says that spans are already marked as ended at the time they are passed to SpanProcessor.OnEnd:

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk.md#onendspan

OnEnd is called after a span is ended (i.e., the end timestamp is already set).

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/api.md#end

End: Implementations SHOULD ignore all subsequent calls to End and any other Span methods, i.e. the Span becomes non-recording by being ended (there might be exceptions when Tracer is streaming events and has no mutable state associated with the Span).

We would also have to change either of those requirements if we wanted to make spans writeable in OnEnd and we would need to understand the consequences well.

Oberon00 commented 3 years ago

Exactly. The difficult to get right (but specified!) order (exporting may happen before the other OnEnd processor) is another problem, but the main problem is that spans are already ended, as @arminru said. This was already discussed in https://github.com/open-telemetry/opentelemetry-specification/pull/669#issuecomment-654841632.

There are two ways we could go forward here:

We definitely can't pass a writable span to OnEnd.

tsloughter commented 3 years ago

Why not define it as your Span Processor must assume it gets the Span in the form it is ended with and not expect to have any changes from previous processors visible?

tsloughter commented 3 years ago

@iNikem my understanding is that it is because OnStart can't be used to implement the Honeycomb feature https://godoc.org/github.com/honeycombio/beeline-go#AddFieldToTrace -- it would only be able to add the attribute to spans started after this call, but it is supposed to be able to add them to all local spans in the trace, even if already started.

I'd be fine with a BeforeEnd span processor, but don't really like adding more hooks when as far as I can tell the only issue is to define that OnEnd processors can't depend on each other and that if you have multiple processors writing the same attribute you can't know which will actually be added last -- so a warning to use unique attribute keys.

Oberon00 commented 3 years ago

No, the main issue is that in OnEnd the span is already ended, and for an ended span, SetAttribute should be a no-op. Changing that seems liable to opening a can of worms.

tsloughter commented 3 years ago

Ah, ok. Yea, I wouldn't want to change the fact that ended means read-only but instead move the "ending" to after the processors. But this isn't possible since one of the processors is what takes the "ended" span and passes it to the exporter... I didn't like exporting being part of processors because of this.

And probably too late to argue that sending to the exporter should be separate and unrelated to processors, leaving OnEnd processors to act like "BeforeEnd" :(

Oberon00 commented 3 years ago

Maybe we would even need a special state where the span is semi-ended, i.e. it has IsEnded==true and has an end timestamp, but is still recording.

iNikem commented 3 years ago

@iNikem my understanding is that it is because OnStart can't be used to implement the Honeycomb feature https://godoc.org/github.com/honeycombio/beeline-go#AddFieldToTrace -- it would only be able to add the attribute to spans started after this call, but it is supposed to be able to add them to all local spans in the trace, even if already started.

I don't understand. If you have a SpanProcessor registered, it will receive all spans created in the given process. And so it will be able to add attributes to all spans. You don't "do a call" manually. All spans are handed to you.

tsloughter commented 3 years ago

@iNikem it only receives a write-able span for OnStart, meaning if spans are already started in the trace you will not be handed those spans at a point that you have the attributes and can write to the span.

iNikem commented 3 years ago

It seems I still don't understand your use-case: what are these attributes that are not available right away but become available only when some spans were already started?

And even more: how does onEnd differ from onStart in this regard? I fail to see how you hoped to achieve your goal with onEnd that you cannot do in onStart?

tsloughter commented 3 years ago

It seems I still don't understand your use-case: what are these attributes that are not available right away but become available only when some spans were already started?

An HTTP user id parsed out of headers is the best example I can think of, but actual honeycomb users (@garthk @gugahoa ?) probably have more. The initial request span is started right away, before user code parses out headers they care about, a child span is started when the user's logic is called at which point they parse out the user-id and want to put it on all spans in that trace but the span created when the HTTP handler got the request has already run OnStart so the only time it could be added is OnEnd -- without manually going through the stack and adding the attribute.

seh commented 3 years ago

That's a good characterization, @tsloughter. This has been a "normal" way of using Honeycomb's Beeline library for a long time. It buffers the trace span tree in memory, and can distribute attributes (called "fields" there) up and down the span tree before sending the spans to the server. It stores a set of fields on the trace itself, in a field on the trace.Trace type called "traceLevelFields." You can see the transcription from that field down to each span just before sending each here.

bogdandrutu commented 3 years ago

I am curios why do you need to still add events/attributes to the Span when you can do that directly on the SpanData that we will export? What is the advantage to change the Span vs modifying the data that will be exported directly?

Oberon00 commented 3 years ago

Agree with @bogdandrutu. I'm not sure which implemenation language you are using but in Java there is https://github.com/open-telemetry/opentelemetry-java/blob/v0.9.1/sdk_extensions/tracing_incubator/src/main/java/io/opentelemetry/sdk/extensions/incubator/trace/data/SpanDataBuilder.java

Unfortunately, #158 is not resolved so this is all a bit vague in the spec.

tsloughter commented 3 years ago

"that we will export"

Where is there a pluggable place to modify SpanData before export?

If you are saying OnEnd Processor can modify SpanData then that would be fine.

iNikem commented 3 years ago

If you are saying OnEnd Processor can modify SpanData then that would be fine.

Processor can certainly modify SpanData

Oberon00 commented 3 years ago

@tsloughter

Where is there a pluggable place to modify SpanData before export?

The exporter interface. You can make your own exporter that wraps an existing one and does the modification.

Oberon00 commented 3 years ago

@iNikem

Processor can certainly modify SpanData

True, but in order for the "modification" (SpanData is immutable at least in Java, so it actually creates an updated copy) to be picked up by an exporter, that exporter must be invoked in the same OnEnd.

gugahoa commented 3 years ago

@bogdandrutu

I am curios why do you need to still add events/attributes to the Span when you can do that directly on the SpanData that we will export? What is the advantage to change the Span vs modifying the data that will be exported directly?

That would mean writing my own processor which will export to the configured exporters, but I wouldn't want to write my own processor for that and instead it would be preferable to rely on tried-and-true processors such as the BatchProcessor

@Oberon00

The exporter interface. You can make your own exporter that wraps an existing one and does the modification.

I'm not sure this translates well to Erlang, but I need to try it out first to see how it goes

tsloughter commented 3 years ago

The exporter interface. You can make your own exporter that wraps an existing one and does the modification.

That is what we hoped not to have to do and I believe is already the way people have been doing it.

But I guess if its a wrapper that can take any exporter it and be a new exporter that acts on the attributes and then calls the wrapped exporter with the modified batch it wouldn't be too bad.

It would certainly be more efficient to update "on end", but it sounds like a wrapper exporter is the way to go -- assuming there aren't issues with the lifetime of the shared attributes on a trace, will have to talk to those using it.

tsloughter commented 3 years ago

I'm not sure this translates well to Erlang, but I need to try it out first to see how it goes

It does. We have an exporter behaviour, otel_exporter. You can create a new exporter that simply goes through all the spans and instead of exporting them it updates their attributes in the table and then calls the export/2 function of the wrapped exporter.

And has to call the init/1 of the wrapped exporter when its init/1 is called at startup.

iNikem commented 3 years ago

It would certainly be more efficient to update "on end"

You seems to have an implicit assumption that for each span all its parent spans from the same trace will end after it. That assumption is wrong. Parent span can end even before child span starts.

seh commented 3 years ago

Parent span can end even before child span starts.

What does that represent? Perhaps an asynchronous task spawned by the parent that took a while to start?

tsloughter commented 3 years ago

It would certainly be more efficient to update "on end"

You seems to have an implicit assumption that for each span all its parent spans from the same trace will end after it. That assumption is wrong. Parent span can end even before child span starts.

I know, this feature can't be perfect, there will be missed spans in some specific cases, but for many cases it appears to be useful and work just fine.

iNikem commented 3 years ago

Parent span can end even before child span starts.

What does that represent? Perhaps an asynchronous task spawned by the parent that took a while to start?

Async servlet/Spring webflux controller. Controller method will return and its span will finish before async result starts actual computing.

garthk commented 3 years ago

If we're not up for changing the behaviour of OnEnd to BeforeEnd, but some users need something with BeforeEnd semantics, why not add a BeforeEnd?

The alternative seems to be to either lose our time correlation or build amusing workarounds like using OnEnd to create a child span with the data we couldn't add to this one, and wrapping the exporter we want in another one which first buffers spans for long enough to reliably find the disposable spans, copy attributes from them to their parent, and then discard them, and then calls the original exporter. It's possible, but it's a lot of work to force on our users.

Oberon00 commented 3 years ago

@garthk You make it sound very complicated. What is that about buffering something long enough? I think I don't understand your use case after all. I thought you simply wanted to do something like

  @Override void OnEnd(ReadWriteSpan span) { span.setAttribute("foo", "bar"); }

which translates to

  @Override CompleteableResult export(Collection<SpanData> spans) {
    Collection<SpanData> modified = new ArrayList<SpanData>(spans.size());
    for (SpanData span: spans) {
      Attributes newAttributes = Attributes.newBuilder(span.getAttributes()).setAttribute("foo", "bar").build();
      modified.add(SpanDataBuilder.newBuilder(span).setAttributes(newAttributes).build());
    }
    return wrappedExporter.export(modified);
  }

(@anuraaga is this actually the best we can do in Java currently? It's really boilerplate-y)

Sure, this is more allocation-heavy and a bit awkward, but it is functionally equivalent at least.

(Writing it out really makes me sympathetic to supporting BeforeEnd, but I still think I might not understand what you actually want here)

garthk commented 3 years ago

The reporter works on the span later than the end of the span, and runs in a different process. It can't collect data at the relevant time or from the relevant process without help. OnEnd can, but it's got nowhere to put the data without extra machinery.

You're correct: the extra machinery would be complicated. I'd prefer not to build it, nor insist our users build it to meet that need. I'd prefer the spec granted that implementations MAY provide a means for the configured tracer to call code to modify the span prior to it reaching a state we consider final.

Oberon00 commented 3 years ago

It can't collect data at the relevant time or from the relevant process

Right, but if you want to collect process data, you can use OnStart. If you need to do something based on span attributes, use the exporter wrapper. If you need to do both at the same time, you can transform the relevant process data into attributes in OnStart and remove them in the exporter wrapper.

garthk commented 3 years ago

I'm sorry, I should have given concrete examples. Here's one with time and process locality: a span attribute for the number of reductions performed by the process during the span.

Reductions are an excellent proxy for CPU effort, and cheap to measure. I'd much appreciate having a reduction count on all my spans, including those generated by my dependencies and their integration dependencies.

Third party code rules out using a function or macro to wrap the code and gather the data.

I could use OnEnd to take the last measurement, but due to the read-only span it couldn't put that measurement or its delta into the span's attributes, forcing the use of some other mechanism to get it to the exporter.

If the process weren't dead by the time the exporter got the span, it'd likely have done some more work, making the effort useless or impossible without using OnEnd and whatever mechanism smuggled the data to the exporter.

All this might be peculiar to our ecosystem, so I'd suggest an RFC 2119 MAY instead of a MUST for providing a BeforeEnd or other suitable way to configure our tracer to meet this requirement.

carlosalberto commented 3 years ago

I'd be fine adding this additional BeforeEnd, although not sure about the timeline (in theory we could still add it, specially if it has a MAY for now, given it's on the SDK level). @bogdandrutu opinion on this?

ahayworth commented 3 years ago

👋 We'd appreciate an optional BeforeEnd hook as well - internally at GitHub we have some complex filtering requirements for any data that may leave our datacenters (such as trace data, which currently goes to a SaaS). We're currently not able to run the opentelemetry collectors for filtering, and that leaves us attempting to scrub spans of sensitive data before they're emitted. [1]

Wrapping SpanProcessors or writing custom SpanExporters is certainly feasible, and we're implementing that solution now. However, a BeforeEnd hook would allow us to implement our (arguably esoteric) filtering needs in a much more ergonomic and sustainable way.

If nothing else, consider this an alternate use-case for a BeforeEnd hook other than annotation of spans with additional data - we need to possibly remove unintentional attributes and events.


[1] Of course, I wish we didn't need to do last-minute, best-effort scrubbing. I wish sensitive data didn't leak out of exceptions and events in the first place. Not emitting sensitive data is the best practice! But we can't always ensure that, and the consequences of mistakes can be severe, so we always attempt a last-minute scrub for known types of sensitive data, just in case.

pyohannes commented 3 years ago

We'd appreciate a BeforeEnd hook as well. Our use case: we want to change the span status for 404 HTTP errors. We can't use the collector.

Writing custom SpanExporters is suboptimal: it involves additional allocations (a span needs to be copied), and it involves additional implementation effort (we'd need to wrap every exporter we use, instead of just having one processor that can be used with any exporter).

anuraaga commented 3 years ago

it involves additional allocations (a span needs to be copied), and it involves additional implementation effort (we'd need to wrap every exporter we use, instead of just having one processor that can be used with any exporter).

I generally agree with the usefulness of BeforeEnd, just want to confirm these points. At least in Java, we have these mechanisms

Just want to check on how useful these current mechanisms are for your use case. Thanks.

tsloughter commented 3 years ago

Feels wrong to have to use an exporter for something that isn't exporting spans but instead is processing them.

pyohannes commented 3 years ago

Just want to check on how useful these current mechanisms are for your use case. Thanks.

Those mechanisms could be useful, if they were spec'd out and available for all languages. However, it seems to me BeforeEnd is a simpler and cleaner solution.

Cavvar commented 2 years ago

Hello everyone! I just wanted to add that a solution with something like a beforeEnd hook would still be appreciated!

DenisAngeletta commented 2 years ago

Hello everyone! I just wanted to add that a solution with something like a beforeEnd hook would still be appreciated!

Any chance this gets any traction?

tsloughter commented 2 years ago

Curious if any implementation has a solution to, or anyone has an argument why it isn't needed, for adding baggage to attributes of the currently active span?

I could see an argument that it would be rare to be needed and that in most cases you could instead set the baggage and then do a delayed start of the span by setting the timestamp on start, but that feels like unnecessary and confusing hoops for the end user to deal with simply because we don't allow the baggage attribute to be added at the end.

Oberon00 commented 2 years ago

you could instead set the baggage and then do a delayed start of the span by setting the timestamp on start

Not sure why you would need to add a timestamp in that case? I guess it could be slightly more correct, but setting the baggage before starting the span is no different from providing attributes at startup, or even "calculating" a span name. You could say, not using an explicit timestamp is almost always slightly incorrect (and the API is sorely lacking a method to get an appropriate timestamp).

On the other hand, adding baggage OnStart vs. On(Before)End may happen in potentially completely different contexts. Lately, I've been thinking that decoupling spans/tracers from Context in the way we did might have been too flexible.

tsloughter commented 2 years ago

If there isn't a need to set the timestamp for the start time then there wasn't really a conflict between adding baggage onstart and onend in the first place.

But if your start and end time would be the same because you have to "start" the span at the end then you need to pass in the actual start time of the operation.

And this is even assuming the user starts the span and not something they've included like a middleware or automatic instrumentation.

DenisAngeletta commented 2 years ago

The reason why a beforeEnd hook is so valuable is because a way to set attributes in a span when it ends can be beneficial for use cases like @garthk described here.

A current workaround could be to write a custom SpanProcessor that takes a SpanExporter and essentially mimicks the behaviour of a SimpleSpanProcessor. The only problem in this case is, that we send spans to n+1 exporters, with n being the amount of exporters you actually define through OTEL_TRACES_EXPORTER. Sending multiple times could be avoided by utilizing the AutoConfigurationCustomizerProvider.

So a hook like beforeEnd or onEnd providing a ReadWriteSpan would avoid alot of this necessary workaround. If there is another way to do this, please let me/us know

Oberon00 commented 2 years ago

@tsloughter Ah, I think I just misunderstood you. I thought it would be OK to have the baggage around the start time, instead of around the end time. If the use case semantically needs the end time, then that won't help of course. Main problem is that you can't create child spans, even if you delay the start.

I do think BeforeEnd is a useful callback, in general.

tsloughter commented 2 years ago

An interesting solution I learned about yesterday is to have sort of sub-processors. The spec already mentions decorators on built-in span processors and so having a processor wrap a processor could be considered to fit that description.

Essentially it means the addition of baggage is done by creating a new Span with the attributes in a "processor" that wraps another (like the simple or batch processor) and passes the new Span to it.

I'm not sure if that makes sense how I've written it but I think it is likely a better solution than BeforeEnd. I can try to explain better in the spec SIG meeting today.

Oberon00 commented 2 years ago

This would then require knowing which processor you want to modify the results for and only apply to that on, right? So I think it's a slightly different use case, and more similar to doing it in the exporter (https://github.com/open-telemetry/opentelemetry-specification/issues/1089#issuecomment-708991092).

tsloughter commented 2 years ago

You'd only need to know the processor in the sense you have to wrap something. Tho the SDK could do it for the user I would think, if the user sends an option to say wrap processors in X or something.