open-telemetry / opentelemetry-configuration

JSON Schema definitions for OpenTelemetry file configuration
Apache License 2.0
40 stars 17 forks source link

Configure multiple providers #5

Open pellared opened 1 year ago

pellared commented 1 year ago

It SHOULD be possible to configure multiple trace/meter/logger providers.

Reference https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#tracerprovider

Notwithstanding any global TracerProvider, some applications may want to or have to use multiple TracerProvider instances, e.g. to have different configuration (like SpanProcessors) for each (and consequently for the Tracers obtained from them), or because its easier with dependency injection frameworks. Thus, implementations of TracerProvider SHOULD allow creating an arbitrary number of TracerProvider instances.

EDIT: Therefore, the schema MUST accept multiple providers.

jack-berg commented 1 year ago

Checkout out opentelemetry-specification#3437 which includes language around this. Although it uses "MUST" which should probably be reduced to "SHOULD" for consistency.

pellared commented 1 year ago

Just to make it clear. I think that the schema MUST accept multiple providers.

tsloughter commented 1 year ago

@pellared why when you can create multiple providers by running multiple configurers?

pellared commented 1 year ago

@pellared why when you can create multiple providers by running multiple configurers?

I do not know the exact use cases but I already heard that some people are having e.g. 2 trace providers. I have quickly found one thread here.

I guess the reason is

e.g. to have different configuration (like SpanProcessors) for each provider

I think that it would be more flexible if we something more or less like

# Configure tracer providers.
tracer_providers: 
  - {}

instead of

# Configure tracer provider.
tracer_provider: {}

Also I think we would need to have something that would mark that given provider should be set as a "global provider".

jack-berg commented 1 year ago

DIsagree strongly. Multiple tracer providers in one file requires that they be named / identified and that the caller has some way to obtain the instance they want. This workflow is minimally different than simply having separate config files - one per provider - and sacrifices the user experience of everyone for an esoteric use case.

pellared commented 1 year ago

This workflow is minimally different than simply having separate config files

Alternatively OTEL_CONFIG_FILE env var might need to support multiple file paths.

How will we define that a provider has to be set as a "global provider"?

an esoteric use case.

I would not judge that is it esoteric if the specification calls it out and I have seen people asking for such things 🤷

pellared commented 1 year ago

Multiple tracer providers in one file requires that they be named / identified and that the caller has some way to obtain the instance they want.

  1. The names/IDs may be optional. They will be needed for some cross-cutting concerns between other elements of the config file (e.g. corelating a trace provider with a tracing instrumentation library).
  2. The "caller" needs to know which provider should be set a global provider.
jack-berg commented 1 year ago

The spec says it should be possible to create multiple providers, but doesn't give any mechanism for identifying these providers or automatically configuring them. That's new to this proposal.

If multiple providers are possible and can be automatically configured via OTEL_CONFIG_FILE, then all instrumentation would have to be aware of this and decide which provider they want:

Map<String, TracerProvider> tracerProviders = Configuration.configure(System.getEnv("OTEL_CONFIG_FILE")) .. // Init multiple tracer providers from OTEL_CONFIG_FILE
  .getTracerProviders();
HttpServerInstrumentation.create(tracerProviders.get("tracer-provider1")); // Initialize http instrumentation with "tracer-provider-1"
DbInstrumentation.create(tracerproviders.get("tracer-provider2"))); // Initializer db instrumentation with "tracer-provider-2"

With one provider per file, its still possible to have multiple providers. The caller is just responsible for referencing each configuration file and passing the resulting providers to the appropriate place in the application:

TracerProvider provider1 = Configuration.configure("/config1.yaml").getTracerProvider();
TracerProvider provider2 = Configuration.configure("/config2.yaml").getTracerProvider();

HttpServerInstrumentation.create(provider1);
DbInstrumentation.create(provider2);

The names/IDs may not be needed unless we have some cross-cutting concerns between other elements of the config file (e.g. corelating a trace provider with a tracing instrumentation library).

If one tracer provider is the global, and that's indicated in the config file, and instrumentations don't select which provider they want because presumably they choose the global, then how are the non-global providers used?

pellared commented 1 year ago

The spec says it should be possible to create multiple providers, but doesn't give any mechanism for identifying these providers or automatically configuring them. That's new to this proposal.

Correct. I think this is something that would part of "Configurer/Configuration API/structure". It does not have to be part of Traces/Metrics/Logs API. This is only something that needs to be parsed/processed during "telemetry pipeline" setup.

pellared commented 1 year ago

Personally, I do not want to propose not decide "how" to do it. My proposals were just "drafts" to "visualize" the issue.

First of all, we should decide if this is something that is planned be addressed.

In my opinion, the Configuration Model MUST allow instantiating multiple providers of the same type to allow complex configurations. One of the reasons we want to use the Configuration Model is to allow setting up complex things which would be almost impossible using env vars.

jack-berg commented 1 year ago

In my opinion, the Configuration Model MUST allow instantiating multiple providers of the same type to allow complex configurations.

That requirement is satisfied by the ability to have multiple configuration files / models. It's a great simplifying assumption to say that the configuration model defines one tracer provider / meter provider / logger provider configuration.

A user that insists on putting multiple configurations in a single file can always use the YAML syntax to define multiple documents in a single file:

---
resource: ...
tracer_provider: ...
---
resource: ...
tracer_provider: ...

And parse like:

List<TracerProvider> tracerProviders = ParseDocuments(new File("/multi-config.yaml"))
    .stream()
    .map(document -> Configuration.configure(document).getTracerProvider())
    .collect(toList());

TracerProvider provider1 = tracerProviders.get(0);
TracerProvider provider2 = tracerProviders.get(1);

Having multiple providers is an exceptional case. The link you posted reiterates that. We don't need to burden application owners with this detail, the vast majority of which will only be confused by why they need to define an array of providers. And we don't need to burden SDK / instrumentation authors with trying to figure out what to when multiple providers are present.

MikeGoldsmith commented 1 year ago

I agree with @jack-berg - I think multiple providers per Configuration Model unnecessarily complicates the schema and doesn't help the use case of accessing multiple providers because they cannot easily be accessed.

The example above can return multiple Configuration Models from the same file using multiple YAML documents will work and gives the same index based access returning multiple providers from the same model would.

pellared commented 1 year ago

The example above can return multiple Configuration Models from the same file using multiple YAML documents will work and gives the same index based access returning multiple providers from the same model would.

People using automatic instrumentation would not able to (easily?) profit from such approach.

doesn't help the use case of accessing multiple providers because they cannot easily be accessed.

I do not get it. The schema would simply need to offer linking providers with other components (e.g. instrumentation libraries).

tsloughter commented 1 year ago

Linking with instrumentation libraries?

I should say that having named providers would make life easier in Erlang because I guess we do something similar to automatic instrumentation (even though it doesn't actually instrument anything automatically, it just sets up the providers at boot time so tracers are available before dependencies boot). So the user is usually not going to run anything like start_tracer_provider but let it be done on boot based on configuration.

I like the multi-file approach though and was just resigning to let it be that the provided config file at boot will start the global providers and any additional providers would be created by the user by manually calling the configurer and providing names for them (providers are named processes in Erlang) at that time.

MikeGoldsmith commented 1 year ago

doesn't help the use case of accessing multiple providers because they cannot easily be accessed.

I do not get it. The schema would simply need to offer linking providers with other components (e.g. instrumentation libraries).

@pellared maybe I misunderstood. I have a few of follow-up questions:

pellared commented 1 year ago

Do you mean allow multiple providers be defined with a distinct name / ID, then when configuring users of providers (eg instrumentation libraries) you can provide the name / ID?

Correct.

What would happen if you didn't set a provider name / ID, or it was invalid and still had multiple providers?

I think the name/ID is optional. If none is provided then we say it is the "global" provider.

If the user provides multiple providers of the same types with the same name/ID then we should return a validation error.

Does that mean we would need to define the default provider per signal type, so consumers of the config that don't know / care about multiple providers, can ask for a default?

If name/ID is not defined then we assume that it is the default. If some component (e.g. instrumentation library) does not reference an provider explicitly then it should use the default one.

Additional "general" notes

I do not say we have to multiple providers support it up-front. My main concern is that I would prefer to have a design/structure/model which would allow such addition in future. Initially we can say that we support only one instance per provider type. Also, I totally agree that probably 95% (or more) of uses would not need this feature and by default users should not need to provide the provider ID/name.

sandipb commented 5 months ago

I don't know if this helps, but one use case that I think we have is having two kind of trace destinations - the default destination is GCP, and then for certain LLM parts of our application we would only like to send the traces to an LLM trace observability platform like Arize Phoenix. We dont want to send every trace to phoenix, and we have no use for LLM traces in GCP. Currently, it seems like I can only create a single provider which has span processors for both GCP and phoenix. I would like to create a tracer specific to phoenix when I want and use a default one otherwise.