Proposal: clarify behavior when retrieving non-existent currently active span

dmathieu commented 1 year ago

For languages that provide an implicitly propagated Context, the API should provide a way to retrieve the currently active span. See https://github.com/open-telemetry/opentelemetry-specification/blob/2cfad37daf7e0d20851fd8a639a55375c3fc93dd/specification/trace/api.md#context-interaction

However, I am seeing a divergence in behaviours between SDKs which I believe would be nice to be coherent about. If there is no current active span, most SDKs will return an invalid/noop span, while others will return undefined.

Go returns a noop span: https://github.com/open-telemetry/opentelemetry-go/blob/d616df61f5d163589228c5ff3be4aa5415f5a884/trace/context.go#L48
Ruby returns an invalid span: https://github.com/open-telemetry/opentelemetry-ruby/blob/main/api/lib/opentelemetry/trace.rb#L52
DotNet returns an invalid span: https://github.com/open-telemetry/opentelemetry-dotnet/blob/6b7f2dd77cf9d37260a853fcc95f7b77e296065d/src/OpenTelemetry.Api/Trace/Tracer.cs#L48
PHP returns an invalid span: https://github.com/open-telemetry/opentelemetry-php/blob/2c00772cad85ffec17e14349f678025d173ad772/src/API/Trace/AbstractSpan.php#L21
JS returns undefined: https://github.com/open-telemetry/opentelemetry-js-api/blob/main/src/trace/context-utils.ts#L35

I believe this is a pretty big difference between those SDKs, as depending on the language being used, folks may get errors if they get a context which unexpectedly doesn't have any span. Or they may be losing data if they get an invalid span and don't check for it.

My proposal is therefore the following:

SDKs that provide a way to retrieve the current span MUST return an invalid or noop span if none were set in the context.
SDKs MAY log a debug if an invalid/noop span was returned.

Oberon00 commented 1 year ago

What is the difference between noop and invalid span?

Oberon00 commented 1 year ago

There is this related (but not entirely applicable) point in the error handling guidance: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/error-handling.md#guidance

Whenever API call returns values that is expected to be non-null value - in case of error in processing logic - SDK MUST return a "no-op" or any other "default" object that was (ideally) pre-allocated and readily available. This way API call sites will not crash on attempts to access methods and properties of a null objects.

EDIT: I think this issue may not require an OTEP, if others agree, maybe move this issue to a normal spec issue.

dmathieu commented 1 year ago

What is the difference between noop and invalid span?

IMHO, their difference is an implementation detail. It's always a span which will not be sending any data once closed.

Thank you for the error handling link. That would definitely point towards returning a noop span rather than nil. I think being able to know when those cases occur (with a warning for example) would be nice, as even though they shouldn't trigger exceptions, they should be catchable as well.

Flarna commented 1 year ago

regarding always return a span: How and a user differentiate between a non sampled trace (which is represented by Noop/NonRecording/... spans) and no trace active at all?

This is relevant for example in propagator.inject() which should not inject a span in case there is no trace active. But if getCurrentContext().getSpan() always provides a span some API on the span is needed to detect in inject is needed or not. I guess comparing spanId/traceId against all 0 all the time is a bit of an overhead.

dmathieu commented 1 year ago

What the Go SDK does is that SpanContext has an IsValid method, which returns false for noop spans. Then propagators return early if the context is invalid.

Oberon00 commented 1 year ago

IsValid is part of the spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#isvalid and it is unrelated to noop vs. not-noop. Instead it checks if the spancontext has a valid trace & span ID. I think that should satisfy your use case @Flarna, because a not-sampled span will have a valid span & trace ID.

Flarna commented 1 year ago

Well for propagators it's fine but I still don't see the advantage to create a Noop instance just to return something. Or to add more APIs on span like (IsSampled(), IsDummy(), IsApiOnly(), IsOnTrace(),...).

What's wrong with null/undefined/... in case there is nothing? Assuming here the language in question has something like this.

It remembers me a bit on C++ std::string which has no difference between no string and an empty string so one needs some extra flag or whatever to represent this case.

dmathieu commented 1 year ago

The difference between nil and a noop span is that a noop span will accept calling all normal methods, while nil will throw exceptions on undefined methods.

From the specifications mentioned above:

This way API call sites will not crash on attempts to access methods and properties of a null objects.

Flarna commented 1 year ago

if you call a non existing method on a Noop it will also throw. So well, wrong usage results in undefined behavior - as one would expect.

Maybe a bit off topic but related. Should we also return a dummy baggage if non is on context? and what should DummyBaggage.getEntry() return? at least in JS this returns BaggageEntry | undefined now.

Similar, what should context.getValue("nonExitistingKey") return as dummy?

dmathieu commented 1 year ago

I meant undefined methods for nil, not undefined for a proper span object.

Flarna commented 1 year ago

SDKs MAY log a warning if an invalid/noop span was returned.

I think we should not issue a warning as it is perfectly fine that no trace is active. warning logs indicate that something is wrong so at most debug/info would should be used in my opinion.

dmathieu commented 1 year ago

Sure, debug makes sense. I've updated the issue description.

dmathieu commented 1 year ago

What I'm seeing all other SDKs do is return empty/noop baggage and metrics/tracer providers, which matches the specification statement.

Regarding values, it seems to differ between SDKs. For example, Go's context returns nil for missing values (but Go's context comes from the standard language library), and Baggage returns an invalid member when retrieving a key which doesn't exist.

open-telemetry / oteps

Proposal: clarify behavior when retrieving non-existent currently active span #216