open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.72k stars 888 forks source link

Global MeterProvider Get/Set API is under-defined #3068

Open MrAlias opened 1 year ago

MrAlias commented 1 year ago

Currently the specification defines this API as follows:

Normally, the MeterProvider is expected to be accessed from a central place. Thus, the API SHOULD provide a way to set/register and access a global default MeterProvider.

This does not specify the API behavior when an "access" is made prior to a global being set. Nor does it specify the behavior of multiple set operations. This seems like a source of user confusion if implementations deviate in behavior.

If one implementation errors, another returns a no-op, and another returns a no-op implementation that will be updated to the first global MeterProvider set by a user, users that work across these implementations will not know what to expect.

Ideally implementations will behave the same. Unfortunately, that doesn't seem possible given the stable releases of certain implementations that have different behaviors.

Language Get before set Multi-sets
Go Delegating No-op returned after first set, latest set instance returned
Java Configures from env or No-op (no delegation) Log error
.NET ? (not sure) ?
Python Delegating No-op returned Log warning
JS No-op (no delegation) Log error
Ruby No-op (no delegation) after first set, latest set instance returned

Based on this survey, I'm not entirely sure how to proceed.

Should we recommend all return from the environment if configured otherwise a delegating No-op implementation when a get is made before a set?

Should all multi-sets log some warning/error?

cc @open-telemetry/go-approvers @open-telemetry/java-approvers @open-telemetry/dotnet-approvers @open-telemetry/python-approvers @open-telemetry/javascript-approvers @open-telemetry/erlang-approvers @open-telemetry/ruby-approvers

Oberon00 commented 1 year ago

Should we recommend all return from the environment if configured

This is not generally possible, as you also need to consider the API-only case.

E.g. a common use case (although discouraged in some languages) might be to call the global getter somewhere deep in a library that has built-in instrumentation. That library would depend on the API package but not any SDK. Thus, there is no code that could interpret environment variables.

.NET, BTW, can probably be left out of this discussion, it is not really "OpenTelemetry" in this regard.

dyladan commented 1 year ago

FWIW I think adding delegation is something that can be done in a non-breaking way. The JS TracerProvider does delegate and it is a feature that I am hoping to add to metrics as well. I would argue the "correct" behavior is a delegating no-op. As far as multi-set, I think that gets much trickier. I prefer the logged warning (no strong opinion on warn/error level).

MrAlias commented 1 year ago

Should we recommend all return from the environment if configured

This is not generally possible, as you also need to consider the API-only case.

E.g. a common use case (although discouraged in some languages) might be to call the global getter somewhere deep in a library that has built-in instrumentation. That library would depend on the API package but not any SDK. Thus, there is no code that could interpret environment variables.

.NET, BTW, can probably be left out of this discussion, it is not really "OpenTelemetry" in this regard.

This makes sense. :+1:

I think to your's and @dyladan point, we should recommend all return a delegating No-op implementation when a get is made before a set, but add a provisional option for an environment based provider for those able to do so. I think the option here is necessary to not make the Java implementation doesn't become incompatible, right?

Oberon00 commented 1 year ago

Please definitely discuss that with the Java maintainers @open-telemetry/java-maintainers, I think it is a bit more intricate there, autoconfiguration is a third package that is separate from the SDK and not automatically used unless you use the auto-instrumenting agent (which I believe is probably the most common way of "using" the Java language client)

jack-berg commented 1 year ago

Hi - let me shed some light on the java behavior:

I think we've arrived at a good place for java: The invariant that all calls to GlobalOpenTelemetry.get receive the same result simplifies initialization issues which might otherwise be head scratchers since either everything is going to work or nothing is going to work. If desired, you can opt in to having GlobalOpentelemetry.get trigger initialization of an autoconfigured instance, which helps in corner cases where its not possible to call GlobalOpenTelemetry.set before instrumentation code initializes.

jmacd commented 1 year ago

Here is a historical reference on this topic, which indicates that Go was influenced by Ruby: https://github.com/open-telemetry/oteps/pull/11

My recollection of the discussion and debate from then is that it matters whether the language in question has a "proper" and well-adopted mechanism for dependency injection and (possibly equivalently) dynamic loading of code. Golang stands out as a language without either of these features, and this is why (IMO) the "delegating no-op" should be supported in Golang.

A language with rules and accepted DI mechanism makes the delegating no-op mechanism unnecessary (IMO) because the DI mechanism is a manner of delegation.

I don't have a strong opinion about the "multi-set" behavior that is best. IIRC it was Ruby that led the way here, so I'd like to ask @mwear what he thinks?

tsloughter commented 1 year ago

So no changes to the spec make us no longer compliant I can describe the Erlang Provider get/set setup:

Providers are registered in a central place by utilizing Erlang's process registry. Each Provider is a process that implements a particular (TracerProvider or MeterProvider) API. On startup of the SDK it will configure the global provider which is registered in the process registry.

A noop tracer/meter is returned if the global provider is called before it has initialized.

Attempts to start another provider registered to the same name will fail, returning already_started and the process ID of the existing Provider.

We don't actually have functions for set/register because it relies on the Erlang process registry and is best left to the process itself.

Long story short, we'd want/need the spec to not require that multiple attempts to set/register the global Provider override the existing Provider.

pellared commented 1 year ago
  1. If I understand correctly then the same "issue" is with TraceProvider and LoggerProvider.

  2. https://github.com/open-telemetry/opentelemetry-specification/blob/025f8fc1a12dda35e42a25c96a0ca4e2c52e34c8/specification/metrics/api.md?plain=1#L62-L64 I like that the Metrics API spec defines the MeterProvider as a component instead of a class as it makes it more open to interpretation (and more abstract) so that the languages can better adopt it to their ecosystem. Therefore in my eyes Erlang and .NET are complaint. Personally, I propose to use the same wording in Trace API and Logs API spec.

  3. How about just leaving a suggestion on how the the Get/Set could be implemented? E.g. Get returns delegating noop provider? If no Set was called and set logs warning if there current provider was NOT a noop provider. Take notice that Get and Set is a abstract thing e.g. it can be done by Erlang process registry or .NET Diagnostics package.

MrAlias commented 1 year ago

Get returns noop if no Set was called

Why not a delegating provider?

pellared commented 1 year ago

Why not a delegating provider?

That's what I meant 😄 Sorry for lack of precision.

(edited my comment)