open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.71k stars 887 forks source link

Should file configuration merge environment variable configuration? #3752

Closed jack-berg closed 5 months ago

jack-berg commented 11 months ago

The conversation about whether file configuration should completely ignore the sdk environment variable scheme came up in #3744, but that PR doesn't actually contain any language related to this.

The original file configuration OTEP stated:

Interpret the configuration model and return SDK TracerProvider, MeterProvider, LoggerProvider which strictly reflect the configuration object's details and ignores the opentelemetry environment variable configuration scheme.

As mentioned here, file configuration doesn't actually contain language describing this behavior. It was included originally included in #3437 but was lost in the PR review shuffle - accidentally, not in response to feedback.

@tedsuo argues in favor file configuration respecting env vars with:

The common expectation among developers is that env vars will automatically overwrite config parameters. If we do it the other way, I am concerned that until the end of time we will have a steady stream of users lodging issues about this and then becoming very frustrated when we explain that they need to modify the config file to use an env var.

The reason that these users will be upset is that they are in a situation where having to define the env vars in the config file is a non-starter. Their use case is that for some reason they either can't get at or are not allowed to modify the config file template, but they really need to override a parameter.

For reference, this is an issue that has come up on many OSS projects I have been involved with where it is common to have both operators and application developers wanting to configure the same thing. Often it's some kind of emergency situation where the person with the rights to change the config file is unavailable.

I should note that of course the opposite situation could be true, where for some reason you want to disable an env var but don't have access to it. But it's probably easier to give users the ability to disable an env var via the config file than it is to give them the ability to disable a config parameter via an env var.

@MrAlias argues in favor of ignoring env vars with:

To be fair, from experience, you're going to get people complaining either way. If you choose environment variable priority over a configuration users will complain that their deployment was altered and failed when an environment variable was set that took precedence.

However, if you make environment variables take precedence you will also need to make some pretty sever and subjective choices on how they are mapped to a config. Do the BSP environment variables apply to all batch span processors or just one? Is the sampler environment variable use for all tracer providers in the config, even ones that specify alternate samplers? Should propagators be merged or overridden? If an exporter is defined by environment, does that stop the console logging exporter used for debugging as well as all the other exporters defined in configuration?

Ultimately, I think the current changes are going to be the most appropriate. They allow users to make their own choice in precedence without making subjective choices for them on how to map things. If a user want environment variables to take precedence, all they need to do is use the OTel environment keys in the related parts of their configuration. In doing so they will answer each of the above questions their own way.

@trask supports the feeling of users expecting env vars to override file configuration, but also says merging configuration from multiple sources is hard:

This is my feeling as well, especially for things like:

OTEL_SDK_DISABLED OTEL_RESOURCE_ATTRIBUTES OTEL_SERVICE_NAME OTEL_LOG_LEVEL OTEL_PROPAGATORS OTEL_TRACES_SAMPLER_ARG I totally understand the nightmare that is merging configuration from multiple sources though.

I wonder if we would have created many of the other env vars (e.g. OTELBSP, OTELBLRP) if we had configuration file support from the beginning? And if so, maybe we can deprecate those other env vars in favor of configuration file?

This topic came up several times during the lengthly review of the file configuration OTEP. Below are links to a number of and relevant points:

https://github.com/open-telemetry/oteps/pull/225#discussion_r1116269308

Layering of config as described below would make it more difficult to reason about what my program is actually being run with.

Additionally, perhaps give them a helper config file that uses env var substitution in the right places so that they can migrate easily (and still get a warning until they get rid of env variables and move everything to the config file).

https://github.com/open-telemetry/oteps/pull/225#discussion_r1119068865

What about 'Solution 3: fail when both environment configuration and file configuration are present'?

I think we could log a warning when we detect this, but failing is too strong. Consider the implications if a user is operating in an environment they don't fully control (i.e. where an ops team configures environment variables by default which they extend / layer on top of).

https://github.com/open-telemetry/oteps/pull/225#discussion_r1142380977

I have a (rather strong) opinion that setting set via env vars has higher priority (takes precedence) over a setting set via configiuration file and it should be marked as a goal.

Any scheme where environment variables have priority over a config file will require some sort of standard mapping between the environment variable schema and file config scheme. IMO, its impossible to define such a mapping which is intuitive in all cases, so better not to try.

Nothing is forcing users to use file based config - its opt in. If they do opt in, they're opting into the documented behavior in which the config file represents the source of truth for configuration. If they wish to customize the experience with additional layers / overrides, they have a couple of tools:

  1. They can use the fact the Configure(config) API accepts a config model as an argument, and provide their own customizations to the model after the initial parsing of the file via Parse(file). An example of such a customization would be to interpret environment variables and apply them to the model in a way they decide makes sense.
  2. They can use environment variable substitution to reference environment variables directly in a configuration file.

Update 3/15/2024

The current state of this issue is:


Update 3/28/2024

Please see this comment updating the status of the issue:

Per @tedsuo’s request, we discussed this issue in the 3/24/27 TC meeting and have made a decision: Generally, we will follow @trask's comment, proceeding with this PR with a few changes:

  • Rename OTEL_CONFIG_FILE to OTEL_EXPERIMENTAL_CONFIG_FILE, reflecting the fact that the semantics around how the value of env var are subject to breaking changes as the file configuration spec and schema continue to evolve.
  • Ensure that env vars which don’t interop with file config are deprecated when file config is ready for stabilization, reflecting that we do not want to recommend multiple competing configuration stories. This could be ensured via an explicit note in the markdown, or a blocking issue - both achieve the same effect. https://github.com/open-telemetry/opentelemetry-specification/issues/3967
  • Ensure that file config has an interop where platforms (i.e. Azure functions, otel operator, etc) contribute to config. We should proceed with #3948 without being prescriptive about how that mechanism works. In the TC meeting, 4 distinct solutions were discussed which had different tradeoffs and limitations. It is clear that we still need to learn more about the requirements and constraints of this use case and let the findings inform the solution. The config working group should prioritize this discussion, but an answer shouldn’t block this PR. We should open a new issue to track the requirements and discuss solutions, and ensure that we treat that issue as blocking for any sort of stabilization effort (although it should ideally be solved much sooner). https://github.com/open-telemetry/opentelemetry-specification/issues/3966
jack-berg commented 11 months ago

My point of view is that while I understand that there is precedence in other systems to have environment variables override other configuration sources, we shouldn't do it. The environment variable configuration scheme does not map cleanly to file configuration, leading to unintuitive behavior when trying to merge. A partial mapping of some of the environment variables is also unintuitive. If we give support environment variable substitution and default values (#3744) that will allow us to provide a file configuration "template" to users as a starting point, complete with comments and env var substitution references to the environment variable scheme with default fallbacks. Note this was the conclusion of OTEP conversation https://github.com/open-telemetry/oteps/pull/225#discussion_r1119068865.

What's not to like? Users start out with the template that essentially is performing the merging of the environment variable scheme with the file configuration scheme (where it makes sense). The template has comments which reinforce that if they delete the env var substitution references, those environment variables will not be considered during parse / create. This should make migrating from environment variable schema to file configuration smooth, while also allowing the implementation to have simple intuitive rules that are easy to explain and which are logically consistent. While some users will inevitably ignore the documentation and expect environment variables to trump file configuration anyway, they'll surely be able to understand the motivation when pointed to the docs.

yurishkuro commented 11 months ago

I agree with @jack-berg (https://github.com/open-telemetry/opentelemetry-specification/issues/3752#issuecomment-1787580645). It's better to have a clean break in both user interface and SDK components interface: components should no longer support env vars directly, they should always take parameters from a config object, and the config object can support env vars as placeholders, but without any semantic meaning attached to their names (it's up to the user to pick those names). It is minimalistic design with clean encapsulation and separation of responsibilities. In that sense env vars do override config file, but only because user explicitly writes them this way, not because of some magic mapping between env var name and the exact position in the config.

Having said that, before we go this way we need to have a very clear migration strategy to avoid introducing breaking changes. The default config template that pulls in all the env vars already defined in the spec is a good approach. What's not clear to me there is whether it requires support for conditionals, because, for instance, in order to have a place in the config to refer to some var related to OTLP exporter, we need to know that OTLP exporter is what the user actually wants to use. It may be possible to accommodate using the declare/use separation used in the Collector config, i.e. the template config will always have a section for OTLP exporter (and all its env vars), but the tracer config portion may not reference the exporter as "I want to use it".

So basically I think we need a working prototype of the config to validate that this approach is workable.

Another migration concern is whether it can be incremental - there are many components in the SDK and requiring they all to be upgraded to config object style of initialization (vs env vars) before anything can be released is going to be a problem. The above approach with a default config template seems like it could be incremental.

One more open question - how such default config template will play with the ability to minimize runtime dependencies of the SDK?

brunobat commented 8 months ago

Please beware that popular frameworks have their config systems in place for years, creating yet another config file might be nice from the OTel point of view but not for the frameworks that already instaciate an OTel SDK of their own.

Placing all possible configs in a OTel config will require frameworks to scan for services there, as example, if we want them to work on native mode. Frameworks already do many other things like, providing defaults, validations and value transformations. All this is already done and would be bypassed by the new OTel Config file.

From my point of view, the properties supplier from the SDK must have the highest priority, higher then the OTel config file. More, there shouldn't be any configuration of the SDK that cannot be performed by using the properties supplier.

In relation to the env. vars., in all frameworks I can remember, they take precedence over any other kind of configuration. OTel shouldn't be special.

jack-berg commented 8 months ago

From my point of view, the properties supplier from the SDK must have the highest priority, higher then the OTel config file. More, there shouldn't be any configuration of the SDK that cannot be performed by using the properties supplier.

These are java specific concepts @brunobat. Can you generalize this feedback to the spec level?

jack-berg commented 8 months ago

Please beware that popular frameworks have their config systems in place for years, creating yet another config file might be nice from the OTel point of view but not for the frameworks that already instaciate an OTel SDK of their own. Frameworks already do many other things like, providing defaults, validations and value transformations. All this is already done and would be bypassed by the new OTel Config file.

Popular frameworks have their own config systems. That's great - if those frameworks want to make opentelemetry a first class citizen, they can evolve those frameworks to be able to able to configure the things opentelemetry users expect (the flat scheme provided by environment variables very quickly runs into problems expressing very real config scenarios). But not all users buy into these frameworks - do we leave these users behind, only providing a suboptimal environment variable schema?

File configuration isn't and doesn't need to be the only configuration option. It doesn't erase the environment variable scheme, or programatic configuration which enables all sorts of alternative configuration mechanisms like those provided by frameworks. But we do need a language agnostic way to fully express the desired configuration of an SDK.

Is the request to do something different with file configuration or to not have a file configuration option at all?

Placing all possible configs in a OTel config will require frameworks to scan for services there, as example, if we want them to work on native mode.

This appears to be no different than the problem we face today in opentelemetry-java with the SPIs to load custom exporters. With opentelemetry-sdk-extension-autoconfigure, you can set OTEL_TRACES_EXPORTER=foo, and if there is a ConfigurableSpanExporterProvider implementation on the classpath corresponding to foo, autoconfigure will use it to create the corresponding SpanExporter.

In relation to the env. vars., in all frameworks I can remember, they take precedence over any other kind of configuration. OTel shouldn't be special.

Below I've listed several examples where trying to merge environment variables with file configuration yields unintuitive / unexpected results. For me to take this request seriously, I need to see proposals on how these situations would be resolved:

Simple Priority

We have the same options in env vars and config file, but different values. Which wins? Easy enough - environment variables always win. Although, we'll no doubt have users complaining that what they see in their config file isn't reflected in reality.

Env vars

OTEL_TRACES_EXPORTER=otlp
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317

File config

tracer_provider:
  processors:
    - batch:
         exporter:
           otlp: 
             endpoint: http://some-other-endpoint:4317

Conflict example 1

We have env variable information which can be merged with file configuration such that both are true. In this case, env variables specify to use the OTLP exporter, while file configuration says zipkin. One person might expect to see the OTLP exporter overwrite zipkin. Another might expect to see spans exported to both OTLP and zipkin. Yet another might expect to see only zipkin.

Env vars

OTEL_TRACES_EXPORTER=otlp

File config

tracer_provider:
  processors:
    - batch:
         exporter:
           zipkin: http://localhost:9411/api/v2/spans

Conflict example 2

In this case we have configuration which can be merged, but doing so will almost certainly yield the wrong result. The environment variables specify the OTLP trace endpoint, assuming the http/protobuf default protocol, which includes the path. The config file specifies grpc protocol. If the environment variable takes priority over file configuration, http/protobuf variant of the endpoint specified via environment variable will be used to configure the OTLP grpc exporter specified in file config, causing an error.

Env vars

OTEL_TRACES_EXPORTER=otlp
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://some-other-endpoint:4318/foo/bar/v1/traces

File config

tracer_provider:
  processors:
    - batch:
         otlp:
             endpoint: http://some-endpoint:4317
             protocol: grpc

Conflict example 3

In this case we see how the flat nature of the environment variable scheme falls short when trying to merge with the structure of a config file. The config file specifies that the sdk should export spans to two different endpoints, but an environment variable is specified which sets the OTLP endpoint to something else. If you override both from the config file you're almost certainly not doing what the user would want. And if you change the effective config to result in a single OTLP exporter with the endpoint in the environment variable, you're also almost surely not doing what the user wants. Presumably, they want to override one of the endpoints in the config file, but its impossible to know that they really want this, or which in the config file to override.

Env vars

OTEL_TRACES_EXPORTER=otlp
OTEL_EXPORTER_OTLP_ENDPOINT=http://yet-another-endpoint:4318

File config

tracer_provider:
  processors:
    - batch:
         otlp:
             endpoint: http://some-endpoint:4318
    - batch:
         otlp:
             endpoint: http://some-other-endpoint:4318

I can keep listing examples, but the point is that I don't know of any strategy for overlaying environment variables on top of a configuration file that will make sense to most users (yet alone all) in most cases. Instead, we should aim to give simple primitives which can be combined to accommodate simple and complex use cases:

Most uses will will be fine just saying OTEL_CONFIG_FILE=/path/to/sdk-config.yaml, and be content with the simply ability to do environment variable substitution in a config file.

Users with complex requirements can combine these primitives in all sorts of interesting way:

brunobat commented 8 months ago

These are java specific concepts @brunobat. Can you generalize this feedback to the spec level?

Yes, will send a PR to the sdk-configuration file.

The spec actually doesn't define any priority over config methods, just states "The SDK MUST provide a programmatic interface for all configuration". I agree with the statement. This can be interpreted as the file config is just another SDK configuration method and shouldn't take precedence over the base generic programatic interface, which in the Java case is represented by the properties supplier.

The OTel file config must use the base generic programatic interface and shouldn't prevent the use of other existing or future configuration systems of the SDK... Which seems totally reasonable.

brunobat commented 8 months ago

Simple Priority

We have the same options in env vars and config file, but different values. Which wins? Easy enough - environment variables always win. Although, we'll no doubt have users complaining that what they see in their config file isn't reflected in reality.

Sure someone will complain, but overriding configs defined in files with env. vars. is widely accepted and common place in the container world. It can be decided either way if properly documented, however not giving higher priority to env. vars. nowadays seems counter intuitive and agains the industry practice.

Conflict example 1

In this case the exporter should be OTLP and the endpoint the one specified in the file. Note that the endpoint shouldn't be specific to zipkin, but owned by the exporter.

These overrides happen all the time in a microservice. Files define the base config and env. vars. define the exceptions to particular attributes. I acknowledge that there can be many overrides.

Conflict example 2

Yes, the resulting merged config doesn't make sense, but the error can also happen with the file based config. Validation will be always needed and should be implemented to save people's time.

Conflict example 3

This is an excellent example of things that cannot be properly configured by env. vars. with the current abstraction, which leads me to the main point....

I think we are mixing the definition of a "programmatic interface for all configuration" with the definition of the file based configuration. This makes the programatic interface effectively useless for other configuration systems because the file config takes precedence and will include things that cannot be done in any other way. The programmatic interface should be independent if we want to properly configure things with files, env. vars. and other config systems, as discussed above.

We should have a generic reusable programmatic interface for the config, a builder pattern for it, and only then the file format configuration for it. Merging and validating the config attributes should also be a task for the builder.

jack-berg commented 8 months ago

I think we are mixing the definition of a "programmatic interface for all configuration" with the definition of the file based configuration. This makes the programatic interface effectively useless for other configuration systems because the file config takes precedence and will include things that cannot be done in any other way.

Why does file configuration make the programmatic interface effectively useless?

File configuration is an abstraction that is built on top of the programmatic configuration interface. It will be natural to package file configuration in a separate artifact than the core SDK components it configures. By definition, it can not be more expressive than the programmatic interface since ultimately all options need to be translated to programmatic equivalents. The proposal in #3805 in which file configuration takes priority over the environment variable scheme is only true if the user opts into it by: 1. Including the necessary file configuration artifact. 2. Specifying OTEL_CONFIG_FILE=...

A framework that has its own configuration system can avoid file configuration entirely by not including the artifact, and / or by providing equivalent functionality. I.e. the framework provides its own format and schema for specifying configuration, and provides logic which parses, validates that configuration and uses the programmatic configuration interface to produce an SDK according to the configuration.

We should have a generic reusable programmatic interface for the config, a builder pattern for it, and only then the file format configuration for it.

💯 That's exactly what's happening.

brunobat commented 8 months ago

💯 That's exactly what's happening.

That's excellent.

Why does file configuration make the programmatic interface effectively useless? From what's written in the spec, nothing. However, when discussing the particular java implementation of the file configuration it was mentioned that some configs would only be available through the file config.

If all configurations will be available programmatic interface, which in the Java implementation case is represented by the properties supplier, I'm ok.

jackshirazi commented 8 months ago

If I have 2 different envs, say test and prod, does the file spec allow for me to specify an env var for that and effectively create 2 different configs from the one file, or are there restrictions which would mean for some things I would have to have 2 different files?

Obviously for the existing description I can provide 2 different files and select one in OTEL_CONFIG_FILE, but my use case here is where I hold a single central template file (thinking opamp in the future) and I want to setup config from that one template based on which env the agent runs

jackshirazi commented 8 months ago

As for the overall question, I've seen this page reading like this for a long time now.

So it would be surprising to me that suddenly OTEL_SERVICE_NAME (to choose just the top one there) no longer works when I - or anyone else in the chain of operators who could be involved in setting up my application+agent - set a file config. Yes, I agree it's opt-in so I should know that using the file means the env var is then ignored. It's still surprising. And we generally prefer the principle of least surprise. I would tend towards merging

For the conflict objections, I would do the simplest merge, and with an option to output all values, it's straightforward to debug. So yes there would be conflicts, but they would mostly be easily caught in dev/test and resolved.

What we see from our customers is that they like to use different configuration capabilities for different things. File config as a base that they can distribute easily. Environment to customize that. Central config for ease of changing config across many different applications (especially for dynamically adjustable options). The current proposal is to implement the file config so it accepts environment, but not the ones already supported, which means that for the existing deployments they already have configured, the file config would need to explicitly include each of those. That is, somewhere in my file config I'll need to have the service name defined and to use ${OTEL_SERVICE_NAME} - otherwise I either can't use file config or I have a painful adjustment in my systems to define a bunch of new things.

As I write this, I'm thinking maybe a reasonable compromise is to provide a file config with all those variables already defined in the file as a template. Of course that doesn't cover all situations and effectively adds boilerplate which is another anti-pattern, but it would be something worth doing if we stay with no merging

trask commented 8 months ago

I think it's going to be continually surprising to users that all of the standard OTEL_* environment variables are ignored as soon as you introduce a yaml configuration file (e.g. to configure metric views).

At least in the Java world, it's super standard (and people expect) for env vars to override configuration files (and not in an all-or-nothing way).

I appreciate that this problem may not have a perfect solution, but I'd like to explore a compromise that could be less surprising to users.

For each env var we could define the minimal affect it has on the config file (as opposed to an all-or-nothing approach).

OTEL_TRACES_EXPORTER is probably the most complex, and so let's see what we could do for it first.

We could say that OTEL_TRACES_EXPORTER=abc means drop other exporters besides abc, while adding abc exporter with default configuration if there is not already at least one abc exporter present. (Another option is to drop all exporters including abc, and add abc exporter with the default configuration, but I think that would be more surprising because it drops all existing configuration of the abc exporter).

OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is simpler, and we could say it overrides every instance in the config file of tracer_provider.processors.batch.exporter.otlp.endpoint.

Applying these rules to @jack-berg's examples:

Conflict example 1

OTEL_TRACES_EXPORTER=otlp

and

tracer_provider:
  processors:
    - batch:
        exporter:
          zipkin:
            endpoint: http://localhost:9411/api/v2/spans

would result in

tracer_provider:
  processors:
    - batch:
        exporter:
          otlp:

Conflict example 2

OTEL_TRACES_EXPORTER=otlp
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://some-other-endpoint:4318/foo/bar/v1/traces

and

tracer_provider:
  processors:
    - batch:
        exporter:
          otlp:
            endpoint: http://some-endpoint:4317
            protocol: grpc

would result in

tracer_provider:
  processors:
    - batch:
        exporter:
          otlp:
            endpoint: http://some-other-endpoint:4318/foo/bar/v1/traces
            protocol: grpc

Note: this is probably an incorrect configuration (using an http/protobuf endpoint and grpc protocol), but I think it's probably the least surprising merge given the user only used env var to override the endpoint and not the protocol.

Conflict example 3

OTEL_TRACES_EXPORTER=otlp
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://yet-another-endpoint:4318

and

tracer_provider:
  processors:
    - batch:
        exporter:
          otlp:
            endpoint: http://some-endpoint:4318
    - batch:
        exporter:
          otlp:
            endpoint: http://some-other-endpoint:4318

would result in

tracer_provider:
  processors:
    - batch:
        exporter:
          otlp:
            endpoint: http://yet-another-endpoint:4318
    - batch:
        exporter:
          otlp:
            endpoint: http://yet-another-endpoint:4318

Note: this is probably an incorrect configuration (having two otlp exporters pointing to the same endpoint), but again I think it's probably the least surprising merge, and therefore probably the best.

As @jackshirazi says above:

For the conflict objections, I would do the simplest merge, and with an option to output all values, it's straightforward to debug. So yes there would be conflicts, but they would mostly be easily caught in dev/test and resolved.

jack-berg commented 8 months ago

Its not perfect but I could get on board with that merge logic. If we were to do this, we should give the user a way to understand what the resolved configuration model looks like. The natural thing to do is to log out the resolved model after applying environment variable overlays, but considering it may have secrets in it, we wouldn't always be able to do that. Instead, we could:

ocelotl commented 8 months ago

There was something else we mentioned in the SIG meeting. Let's think about what happens right now for any implementation that does not have the configuration prototype:

Let's say that these environment variables are defined (I'll use here the same example @jack-berg used during the SIG meeting):

OTEL_TRACES_EXPORTER=otlp
OTEL_EXPORTER_OTLP_ENDPOINT=http://yet-another-endpoint:4318

And the user has this code in the application:

SdkTracerProvider.builder()

        .addSpanProcessor(BatchSpanProcessor.create(OtlpGrpcSpanExporter.create("http://some-endpoint:4318")))

        .addSpanProcessor(BatchSpanProcessor.create(OtlpGrpcSpanExporter.create("http://some-other-endpoint:4318")))

When that combination of environment variables and application code are executed something happens, let's give this something a name: X.

Now, the code above is equivalent to some configuration file, because when a certain configuration file is used, we also get X.

If we think about this project for a moment, we can also see it not as "configuration of OpenTelemetry" but as "configuration of a script that creates some certain OpenTelemetry objects before the application is run". Every configuration file makes this script be execute in a certain way whose results could also be achieved by executing some certain code directly in the application.

So, we can see this problem in this way:

What happens when OTel is executed when there are environment variables set and also a configuration file?

And the answer to that question could be:

The same that would happen if the equivalent objects that are created with the configuration file would have been instantiated in the application with the same environment variables set.

I think users could understand this configuration project better if we present it not as configuration of OpenTelemetry in the way that environment variables configure OpenTelemetry, but as a way for them to define which and how certain components are instantiated before the rest of the application runs.

ocelotl commented 8 months ago

I gave this issue more thought and now I think that this could be not an issue at all. Instead of trying to find an algorithm to either merge or override or something else, let's just tell the users, this is what the configuration file component would end up instantiating if you run OpenTelemetry with the environment variables that are set right now. To do this, I propose we add to the configuration file component an option (maybe named dry_run or something similar) that instead of running anything would just print the code that would instantiate the same objects that would be instantiated by the configuration file component if it was executed normally. In this way, the user can adjust their environment variables and find out the result that changing them would have on the instantiated objects and we don't have to figure out any algorithm to solve conflicts between the environment variables and configuration files.

I implemented an example that partially prints the code that would correspond to an instantiation of a tracer provider, running this produces this:

TracerProvider(
    Sampler(
        always_off=None,
        always_on=None,
        jaeger_remote=None,
        parent_based=ParentBasedSampler(
            local_parent_not_sampled=LocalParentNotSampledSampler(
                always_off=None,
                always_on=None,
                jaeger_remote=None,
                parent_based=ParentBasedSampler(
                    local_parent_not_sampled=None,
                    local_parent_sampled=None,
                    remote_parent_not_sampled=RemoteParentNotSampledSampler(
                        always_off=None,
                        always_on=None,
                        jaeger_remote=None,
                        parent_based=None,
                        trace_id_ratio_based=ParentBasedTraceIdRatio(
                            TraceIdRatioBased(
                                0.0001
                            ),
                            StaticSampler(
                                Decision(
                                    2
                                )
                            ),
                            StaticSampler(
                                Decision(
                                    0
                                )
                            ),
                            StaticSampler(
                                Decision(
                                    2
                                )
                            ),
                            StaticSampler(
                                Decision(
                                    0
                                )
                            ),
                        ),
                    ),
                    remote_parent_sampled=None,
                    root=None,
                ),
                trace_id_ratio_based=None,
            ),
            local_parent_sampled=LocalParentSampledSampler(
                always_off=None,
                always_on=StaticSampler(
                    Decision(
                        2
                    )
                ),
                jaeger_remote=None,
                parent_based=None,
                trace_id_ratio_based=None,
            ),
            remote_parent_not_sampled=RemoteParentNotSampledSampler(
                always_off=StaticSampler(
                    Decision(
                        0
                    )
                ),
                always_on=None,
                jaeger_remote=None,
                parent_based=None,
                trace_id_ratio_based=None,
            ),
            remote_parent_sampled=RemoteParentSampledSampler(
                always_off=None,
                always_on=StaticSampler(
                    Decision(
                        2
                    )
                ),
                jaeger_remote=None,
                parent_based=None,
                trace_id_ratio_based=None,
            ),
            root=RootSampler(
                always_off=None,
                always_on=None,
                jaeger_remote=None,
                parent_based=None,
                trace_id_ratio_based=ParentBasedTraceIdRatio(
                    TraceIdRatioBased(
                        0.0001
                    ),
                    StaticSampler(
                        Decision(
                            2
                        )
                    ),
                    StaticSampler(
                        Decision(
                            0
                        )
                    ),
                    StaticSampler(
                        Decision(
                            2
                        )
                    ),
                    StaticSampler(
                        Decision(
                            0
                        )
                    ),
                ),
            ),
        ),
        trace_id_ratio_based=None,
    ),
    Resource(
        reprOrderedDict(
            [
                (
                    "telemetry.sdk.language",
                    "python",
                ),
                (
                    "telemetry.sdk.name",
                    "opentelemetry",
                ),
                (
                    "telemetry.sdk.version",
                    "1.23.0.dev0",
                ),
                (
                    "service.name",
                    "unknown_service",
                ),
            ]
        ),
        schema_url="https://opentelemetry.io/schemas/1.16.0",
    ),
)
Emily-Jiang commented 8 months ago

My big concern with having the yaml file overrule all of the configuration:

  1. Yaml is not as easy as defining an environment variable.
  2. If you use the environment variables as well as the yaml file, you have not duplicate the environment variables in the yaml file. Otherwise, the variables will be get ignored.

With this, I support to have the file configuration automatically include the environment variables. If the same configuration defines in the file and also as an environment variable, the value from the file rules. By the way, in MicroProfile Config, the environment variable and system properties are also opt in to provide configurations by default.

yurishkuro commented 8 months ago

I think it's going to be continually surprising to users that all of the standard OTEL_* environment variables are ignored as soon as you introduce a yaml configuration file (e.g. to configure metric views).

@trask I don't know how surprising it would be (after all they are making a decision to use a config), but overall I would strive for less complexity rather than more. The existing situation with env vars is already pretty complex, and devising overlaying rules with config adds even more complexity (the content of your comment is a perfect illustration of that complexity). I don't feel that this complexity is warranted, given that standard variable substitution in the config provides the same customization capabilities, it's a well-understood solution without additional mental overhead.

jack-berg commented 8 months ago

I propose we add to the configuration file component an option (maybe named dry_run or something similar) that instead of running anything would just print the code that would instantiate the same objects that would be instantiated by the configuration file component if it was executed normally. In this way, the user can adjust their environment variables and find out the result that changing them would have on the instantiated objects and we don't have to figure out any algorithm to solve conflicts between the environment variables and configuration files.

@ocelotl The idea of having code that prints code seems brittle and hard to maintain.

But I'm also having a little trouble understanding what you are suggesting, but let me to restate my understanding:

I don't think I like this. It essentially leaves the merge semantics to be a language level decision, which reduces portability of configuration.

If I'm misinterpreting the proposal please let me know.

yurishkuro commented 8 months ago

It essentially leaves the merge semantics to be a language level decision, which reduces portability of configuration.

I think this is a great argument against overlaying! We already have a mess of compatibility matrices with varied support for different env vars. Saying No to overlaying just removes this problem altogether - languages only need to implement one generic variable substitution, not a mish-mash of implementations in each and every component.

ocelotl commented 8 months ago

@ocelotl The idea of having code that prints code seems brittle and hard to maintain.

Maybe it is different for other languages but in Python it's quite simple, we only need to add a well-known method to every class.

  • You're coming from a standpoint where a language, like python, implements the interpretation of the environment variable scheme directly in components rather than in a separate artifact. I.e. the Otlp exporters directly interpret OTEL_EXPORTER_OTLP_* environment variables.

Correct. That's what we currently do in Python. This is an important point, more about this later.

  • File configuration strongly suggests decoupling interpreting the configuration model from the SDK components, but ultimately, the Create operation still has to instantiate components like the OTLP exporter, which in some languages may be performing additional interpretation of environment variables.

Ok, I think I agree with this...

  • You're saying that we don't need to solve the merge problem because languages that interpret environment variables inside of components already have merge semantics. So instead, just give tools for printing out how how these things interact.

Mostly correct, more about this later too.

I don't think I like this. It essentially leaves the merge semantics to be a language level decision, which reduces portability of configuration.

The merge semantics are already a language decision. If they are currently a problem, we have to fix that problem where it currently is, not by adding the configuration file component. The file configuration component cannot be the solution for this problem because:

  1. Even after the file configuration component is added, users will still be able to run OTel without using it.
  2. The file configuration component trying to merge or override the environment variables would be an additional problem because there is no right way of doing this (or maybe it is (Dynaconf may have an algorithm, more below)? :thinking:) kind of operation.

@jack-berg I think that one of your goals for the file configuration component is to provide developers with a clean, decoupled mechanism to obtain configuration values (something like configuration = Configuration(); configuration.otel_exporter_timeout == 10). That would be great (in fact I even tried to do the same thing long time ago). While working on that I realized there already existed a Python project that did the same thing, Dynaconf.

Now, 20 seconds ago, while writing this I looked into this project again and found that they may have an algorithm that we could use (hope this helps!).

If I understand @yurishkuro comments right, I think @yurishkuro and I agree (@yurishkuro please correct me if I am wrong). We should not try to add an arbitrary or not-so-well defined algorithm to merge the environment variables and configuration files.

So, to summarize a few things:

  1. Regardless of what we decide on this merging/overriding/etc issue, I see value in allowing the user to know what objects are to be instantiated by the configuration file component. This can help users clearly understand what the file configuration component is doing which is essential for debugging. For Python I have implemented this feature by printing the equivalent code, maybe for other languages there are better approaches.
  2. I am not opposed to merging environment variables with configuration values per se. I am opposed to merging environment variables with configuration values using an arbitrary or not-so-well defined approach. If we can't find a clean way of doing this, we are better not doing it at all.
  3. It would be great to have a "configuration" object that we can use in our code that would abstract developers from having to use the "raw" values of environment variables (or maybe configuration files too). From my past experience implementing that I remember it was nice to have something that would automatically transform an enviromnent value string "true" into a boolean value True that we could use in the SDK. Nevertheless, if we want to have this feature work with environment variables and configuration files too, we first need to find a clean, proper algorithm to do the merging/overriding, if not, we are better without this feature as well.
brunobat commented 8 months ago

Now, 20 seconds ago, while writing this I looked into this project again and found that they may have an algorithm that we could use (hope this helps!).

In Java there is Microprofile Config. It's a spec with many implementations to manage configurations: https://github.com/eclipse/microprofile-config

It would be great to have a "configuration" object that we can use in our code that would abstract developers from having to use the "raw" values of environment variables (or maybe configuration files too).

I agree.

I think it makes sense to discuss about configuration sources. We now have file and env./sys. vars. (there might be more in the future?) and I think it make sense to decide if we are going to have a hierarchy of sources or if they will be flat and it's up to the implementations to figure out the merge. I think leaving the merge behaviour to the language specific implementations would lead to inconsistencies and mess up the config of large microservice deployments, with services implemented in many languages.

trask commented 8 months ago

Do we know of any (other) libraries/frameworks that have standard env vars and a standard configuration file format where the standard env vars don't take precedence (and not in an all-or-nothing way)? I can't think of any in the Java space, but maybe this is common in other ecosystems?

brunobat commented 8 months ago

CC @kittylyst

ocelotl commented 8 months ago

I think leaving the merge behaviour to the language specific implementations would lead to inconsistencies and mess up the config of large microservice deployments, with services implemented in many languages.

Yes, I agree. Just to be clear, my point is that even without this file configuration component, this problem can happen (and probably happens) right now, something like this:

# An environment variable is set beforehand
OTEL_EXPORTER_ENDPOINT_URL="http://some.url"

...

# Here an OTelExporter is instantiated, endpoint_url is optional
# and its default value is an empty string.
exporter = OTelExporter(endpoint_url="http://some.other.url")

Which value will the exporter have for endpoint_url?

Again, I agree, it's a bad thing to leave the final value of endpoint_url to be each language decision.

jack-berg commented 8 months ago

The merge semantics are already a language decision. If they are currently a problem, we have to fix that problem where it currently is, not by adding the configuration file component.

I don't think we actually have to solve that.

Consider if we agree to state that file configuration should ignore the existing environment variable scheme: In this case, implementors of the Create method would have to invoke the programmatic APIs of components in a way that ensures that environment variables ignored. If not, the implementation would not be compliant.

Now consider we take the opposite stance, and state that we want to merge the environment variable scheme with file configuration: In this case, we definitely want consistency across languages in terms of how the merge semantics work. Its unlikely that the existing implementations are consistent, so we'd probably have to:

Regardless of what we decide on this merging/overriding/etc issue, I see value in allowing the user to know what objects are to be instantiated by the configuration file component. This can help users clearly understand what the file configuration component is doing which is essential for debugging. For Python I have implemented this feature by printing the equivalent code, maybe for other languages there are better approaches.

Yes this is important. In java, all of our SDK components implement public String toString() and print out their configuration using a format which is idiomatic in java. We already use this today to allow the user to understand their effective resolved SDK after various customization layer have muddied things up. For file configuration, we should have the added ability to take a resolved configuration model and print it back out to YAML. Ideally, this directly describes an SDK. But in practice, there may be small discrepancies between a model and an SDK (e.g. consider a model with a exporter property the exporter doesn't know about, or that the file configuration Create method doesn't yet know how to interpret).

I am opposed to merging environment variables with configuration values using an arbitrary or not-so-well defined approach. If we can't find a clean way of doing this, we are better not doing it at all.

The dynaconf example you give is interesting, but it essentially just describes what the equivalent environment variable is to target a nested property in a configuration model. That helps us if we wanted to introduce an entirely new mechanism environment variable schema with names derived from the configuration model schema. (This is worth considering). But I can't see what it tells us about merging the existing environment variable scheme with file configuration.

It would be great to have a "configuration" object that we can use in our code that would abstract developers from having to use the "raw" values of environment variables (or maybe configuration files too). From my past experience implementing that I remember it was nice to have something that would automatically transform an enviromnent value string "true" into a boolean value True that we could use in the SDK. Nevertheless, if we want to have this feature work with environment variables and configuration files too, we first need to find a clean, proper algorithm to do the merging/overriding, if not, we are better without this feature as well.

That's what we're trying to achieve with the configuration model. In #3840 I propose that we introduce a new dedicated operation for updating a configuration model with environment overloads. The effect would be that a user could call Create(configurationModel) to configure from a model ignoring environment variables, or call Create(MergeEnvironment(configurationModel)) to overlay the environment variable scheme on top of the model before calling create. The design philosophy mirrors that of unix, with small focussed programs (in this case functions) which can be combined to accommodate a wide variety of requirements.


I want to draw attention to this comment from @trask:

Do we know of any (other) libraries/frameworks that have standard env vars and a standard configuration file format where the standard env vars don't take precedence (and not in an all-or-nothing way)?

If there aren't examples of this, then that's a strong signal to us, since we shouldn't design something out of step with industry expectations / norms.

Supposing that we can't think of enough examples to make a strong case, our options for supporting environment variable overrides include:

  1. Define semantics for merging the existing environment variable scheme into a configuration model. @trask proposed one option based on principle of least surprise (which I've drafted into PR #3840), but there are other semantics possible. I think we can all agree that if we take this path there will be cases where users are surprised by the merge semantics, but this may be ok.
  2. Invent a new environment variable scheme specifically for overriding file configuration properties. Use a well-defined set of rules for deriving the environment variable name which selects any particular property. State that the old environment schema is ignored when file configuration is used. In the long term, the user experience will be less surprising, since there are clean, consistent rules for environment variable overrides. But in the short term, there will be pain because of the hard cutover required to migrate to file configuration.
codeboten commented 8 months ago
  1. Invent a new environment variable scheme specifically for overriding file configuration properties. Use a well-defined set of rules for deriving the environment variable name which selects any particular property. State that the old environment schema is ignored when file configuration is used. In the long term, the user experience will be less surprising, since there are clean, consistent rules for environment variable overrides. But in the short term, there will be pain because of the hard cutover required to migrate to file configuration.

I would strongly suggest following option 2, as it makes it clear what the expectations of specific variables are when interoperating with a configuration file. I would suggest we state that using existing env variables with config creates an unspecified state, since currently implementations are somewhat doing different things to implement support for these.

I think that merging existing env var schema overtop of the configuration makes it difficult for end users to remember (if i use this variable such thing happens, otherwise this other thing) and makes it tricky for implementations as well.

Inventing a new set of variables isn't ideal, especially since so many of the variables that exist today are marked stable and we'd likely have to support them until 2.0, but at least its easier to understand what the repercussion of using these new variables would be in this context.

ocelotl commented 8 months ago

I would strongly suggest following option 2, as it makes it clear what the expectations of specific variables are when interoperating with a configuration file. I would suggest we state that using existing env variables with config creates an unspecified state, since currently implementations are somewhat doing different things to implement support for these.

Nice! Let's dynamically generate the environment variables from the configuration file :sunglasses:

jack-berg commented 8 months ago

Food for thought @codeboten, @ocelotl and anyone else interested the proposal to introduce a new environment variable scheme where keys are generated from the schema:

https://docs.google.com/document/d/1yPfdf6fsxWY7onWU_PLmIIPs14H6pTYbyI7OboiODCw/edit

The solution is not without problems...

jackshirazi commented 8 months ago

I think you're one step away from making a turing complete environment variable scheme

brunobat commented 8 months ago

The google docs document is interesting and proposes a set of important rules.

I don't mind having a file centric naming schema but do we really need to change the name of most env vars defined in the current environment variables?

I would prefer to have started with the definition of a "configuration" object based on the current environment variables to ensure backwards compatibility, where possible.

I also believe a "configuration" object works better as the foundation for environment variables, file configs and other possible configuration sources, like frameworks that already have their own configuration schema and need to integrate with the OTel SDK.

The "configuration" object should accept attributes from different sources and perform the merge there. We are probably missing a component here and trying to assign to much scope to the file config.

Yes, we should merge environment variables, but not implemented in the file config.

Maybe the OTel spec shouldn't defined a wide set of attribute names, but the naming rules, merging strategy, sources and priority between sources should be the focus of it.

radcortez commented 8 months ago

I think it would be fine for OpenTelemetry to define its rules for configuration as long as integration frameworks can override them completely.

For instance, in Quarkus, we integrate with a considerable number of libraries, each with its own flavor of how to configure them. Most of these libraries do have programmatic APIs, that allow us to bypass any other means of configuration to provide our own. One of the goals of Quarkus as a platform is to provide consistent behavior so our users know how configuration works, and they expect to use it in the exact same way in each integration.

In Quarkus, we do integrate and are committed to pushing OpenTelemetry. Currently, we are able to set our desired configuration via a customizer. Ideally, we should have the option to participate directly in OpenTelemetry configuration, by having an option to analyze the configuration read by OpenTelemetry before being applied, so we could reconfigure it as needed. We also provide our own specific configuration that affects how OpenTelemetry is handled in Quarkus.

jack-berg commented 8 months ago

I would prefer to have started with the definition of a "configuration" object based on the current environment variables to ensure backwards compatibility, where possible.

It sounds like you're saying: derive the configuration model from the current environment variable scheme. If so, we discussed this and rejected it because the existing scheme is too limiting. It was designed to express only simple configs where as a purpose-built configuration model should reflect the full configurable surface area of as described by SDK specs.

I also believe a "configuration" object works better as the foundation for environment variables, file configs and other possible configuration sources, like frameworks that already have their own configuration schema and need to integrate with the OTel SDK. The "configuration" object should accept attributes from different sources and perform the merge there. We are probably missing a component here and trying to assign to much scope to the file config.

I don't think we're missing a component. Please have a look at the entire file configuration document. It describes a configuration model (i.e. configuration object as you state it) as a requirement for implementations, and defines the create operation as accepting a configuration model. The idea is that the configuration model can be parsed from a file, or built up / edited programmatically.

Also note the language from #3805:

Users that require merging multiple sources of configuration are encouraged to customize the configuration model returned by Parse before Create is called.

This is the idea you're discussing about accepting different config sources and merging. #3805 says "let the frameworks do that if they need to".

Implementations MAY provide a mechanism to customize the configuration model parsed from OTEL_CONFIG_FILE.

The idea here to allow for users to access the parsed configuration model and customize it before passing to to Create to be interpreted.

jack-berg commented 8 months ago

I have a new argument in favor of having a new environment variable scheme with keys / rules derived from the model as discussed in detail here: This type of capability is consistent with the collector. See how to provide configuration:

./otelcorecol --config=file:examples/local/otel-config.yaml --config="yaml:exporters::debug::verbosity: normal"

Note that the collector has a command line scheme for overriding instead of using environment variables, but the effect is essentially the same.

I (now) providing a new environment variable scheme is the most correct route to take, as it:

I propose we:

brunobat commented 8 months ago

Sounds a good compromise, thanks @jack-berg

ocelotl commented 8 months ago

I (now) providing a new environment variable scheme is the most correct route to take, as it:

  • Provides the env var override capability users have come to expect in configuration
  • Avoids the messy merge semantics associated with merging the existing environment variable scheme
  • The sharp edges associated with such a scheme can be softened by refactoring the configuration schema to be more friendly to the rules (i.e. by avoiding object arrays)

The first two previous points are great and I also want an environment variable model that can override the configuration file and save us from the merging dilemma.

Nevertheless, after thinking about it a bit, I now feel (and I can be wrong) that we are better without environment variables at all: yes, I mean deprecating all environment variables and not having any OTEL_ environment variables after the file configuration is added.

These are my concerns:

  1. We have environment variables now whose behavior is unspecified and hard to specify. For example OTEL_EXPORTER_OTLP_ENDPOINT. Which exporter does this environment variable apply to?
  2. For the overriding of config values with environment variables to work we need an syntax that can represent the "access by name" (in objects) and "access by index" (in arrays) in environment variable names which can only contain alphanumeric characters and underscores.

This is what I think about these concerns:

  1. We have to do something about these environment variables that are unspecified and I think anything we do will be considered a breaking change by some criteria. I think we need to deprecate these environment variables and yes, that would be considered a breaking change. Deprecating all environment variables will be a breaking change too, but we have to make a breaking change nevertheless because of these unspecified environment variables.

  2. So, an environment variable name like OTEL_SOME_OBJECT.SOME_NAME[4].OTHER_OBJECT.OTHER_NAME would be ideal, but we can't use any character in an environment variable name that is not alphanumeric or an underscore. This means we would need to find a way to represent the same using pretty much underscores and maybe numbers, something like this: OTEL__SOME_OBJECT__SOME_NAME__4_OTHER_OBJECT__OTHER_NAME. But keep in mind that configuration supports underscores already so, we need to find a way to handle the original underscores (or any other character that is not supported in an environment variable name) so that this syntax works in all cases. I think @jack-berg suggests we don't use some features that YAML files have to make them more "even" with environment variables. I don't really agree, we would be making our best configuration mechanism worse. Let's keep in mind that everything that we can do with environment variables we can do with a configuration file.

I hope I am wrong and there is a good solution. I understand that our users are used to environment variables and it would be neat to have a way for them to override the configuration file but because of the restrictions on the characters that are available in the configuration variable names it's going to be quite difficult.

jack-berg commented 8 months ago

@ocelotl have you taken a look at https://github.com/open-telemetry/opentelemetry-configuration/pull/69? I refactored the configuration schema to have better interop with an environment variable naming scheme derived from the model. The changes are essentially:

I updated the kitchen sink example to reflect the names of environment variables which would be used to target all the properties.

It definitely adds some restrictions to modeling, but they're not that bad IMO.

marcalff commented 7 months ago

[EDIT 2024-02-06]

Withdrawn:

I still think some users will use yq from user space, to edit config.yaml (I know I will), but the general solution for all SIG can not rely on yq.

opentelemetry will have to implement a merge scheme from scratch, in every SIG.

cc @jack-berg @tedsuo


Food for thought @codeboten, @ocelotl and anyone else interested the proposal to introduce a new environment variable scheme where keys are generated from the schema:

https://docs.google.com/document/d/1yPfdf6fsxWY7onWU_PLmIIPs14H6pTYbyI7OboiODCw/edit

The solution is not without problems...

Transposing the same examples using yq instead leads to the following:

$ cat input.yaml 
resource:
  attributes:
    service.name: my-service
tracer_provider:
  processors:
    - batch:
        max_export_batch_size: 512
        exporter:
          otlp:
            endpoint: http://endpoint1:4317
            headers:
              api-key: 1234
propagators:
  composite:
    - tracecontext
    - baggage
$ cat example-1.yq 
.disabled=false |
.resource.attributes."service.name"="foo" |
.tracer_provider.processors[0].batch.exporter.otlp.headers.api-key="abcd"
OTEL_MERGE=`cat example-1.yq`
echo ${OTEL_MERGE}

docker run --rm -i -v "${PWD}":/workdir mikefarah/yq "${OTEL_MERGE}" < input.yaml > output-1.yaml

Leads to:

$ cat output-1.yaml 
resource:
  attributes:
    service.name: foo
tracer_provider:
  processors:
    - batch:
        max_export_batch_size: 512
        exporter:
          otlp:
            endpoint: http://endpoint1:4317
            headers:
              api-key: abcd
propagators:
  composite:
    - tracecontext
    - baggage
disabled: false

And likewise,

$ cat example-2.yq 
del(.propagators.composite[]) |
.propagators.composite = ["tracecontext", "baggage", "b3"]
$ cat output-2.yaml 
resource:
  attributes:
    service.name: my-service
tracer_provider:
  processors:
    - batch:
        max_export_batch_size: 512
        exporter:
          otlp:
            endpoint: http://endpoint1:4317
            headers:
              api-key: 1234
propagators:
  composite:
    - tracecontext
    - baggage
    - b3
$ cat example-3.yq 
.tracer_provider.processors[1].batch.max_export_batch_size = 1000 |
.tracer_provider.processors[1].batch.exporter.zipkin.endpoint = "http://zipkin"
$ cat output-3.yaml 
resource:
  attributes:
    service.name: my-service
tracer_provider:
  processors:
    - batch:
        max_export_batch_size: 512
        exporter:
          otlp:
            endpoint: http://endpoint1:4317
            headers:
              api-key: 1234
    - batch:
        max_export_batch_size: 1000
        exporter:
          zipkin:
            endpoint: http://zipkin
propagators:
  composite:
    - tracecontext
    - baggage
$ cat example-4.yq 
.tracer_provider.processors[0] = "noop"
$ cat output-4.yaml 
resource:
  attributes:
    service.name: my-service
tracer_provider:
  processors:
    - noop
propagators:
  composite:
    - tracecontext
    - baggage

Further reading:

Disclaimer:

I just discovered the tool now, researching alternate solutions for opentelemetry, and I have never used yq.

From the doc and what it can do, this looks it should be investigated.

marcalff commented 7 months ago

@jack-berg

It is customary to use uppercase in environment variables names, but maybe the full space can still be used to encode a yaml path, with lowercase characters reserved for encoding.

For example:

Would that help ?

ocelotl commented 7 months ago

@jack-berg

It is customary to use uppercase in environment variables names, but maybe the full space can still be used to encode a yaml path, with lowercase characters reserved for encoding.

For example:

  • service.name can be encoded SERVICE_dot_NAME
  • api-key can be encoded API_dash_KEY
  • processors[1].batch can be encoded PROCESSORS_index_1__BATCH

Would that help ?

A YAML file could have a SERVICE_dot_NAME and a service.name entry, which would cause ambiguity.

jack-berg commented 7 months ago

It is customary to use uppercase in environment variables names, but maybe the full space can still be used to encode a yaml path, with lowercase characters reserved for encoding.

We could do a lot of different things with the env var syntax, but fundamentally there will always be a tradeoff:

I'm hesitate about getting too much into the solution space because we risk letting the cart get ahead of the horse. If we agree about introducing a new environment variable scheme derived from the configuration scheme (#3850), then we can explore the particulars of how such a scheme should work and whether we prioritize a simpler env var experience at the expense of the model, or vice versa. But thats a big solution space and I don't want to spend the cycles doing proper full exploration before we have agreement that its the general direction we want to go. I understand that some understanding of how it might work is required to reach agreement. For me, the exploration in this doc and https://github.com/open-telemetry/opentelemetry-configuration/pull/69/ are enough to get a feel of generally what we're going to be dealing with.

marcalff commented 7 months ago

A YAML file could have a SERVICE_dot_NAME and a service.name entry, which would cause ambiguity.

Currently every yaml name is encoded in uppercase, so a yaml node service_dot_name or SERVICE_dot_NAME is encoded as SERVICE_DOT_NAME, this was my point that not all the namespace is used.

SERVICE_dot_NAME can not collide with a yaml node.

ocelotl commented 7 months ago

I wonder how important is for our users to be able to override configurtion using environment variables specifically. Maybe they just care to be able to override configuraion in some way (not necessarily using environment variables)?

During the spec SIG it was suggested to allow users to have another YAML file to do the overriding, I think this sounds like a good solution.

jack-berg commented 7 months ago

During the spec SIG it was suggested to allow users to have another YAML file to do the overriding, I think this sounds like a good solution.

The idea that was suggested was to have a single environment variable which accepted JSON (since YAML relies on line breaks and indentation which is impractical to flatten into text) which would be overlaid on top the YAML. So something like:

tracer_provider:
  processors:
    - batch:
         exporter:
           otlp:
  limits:
    attribute_value_length_limit: 10

and

export OTEL_CONFIG_FILE_OVERRIDES='{"tracer_provider":{"limits":{"attribute_value_length_limit": 20}}}'

Would cause the .tracer_provider.limits.attribute_value_length_limit to resolve to 20.

While this accomplishes the override requirement, it's not a good user experience. It mixes YAML and JSON, increasing cognitive load, and forces the user to write flattened JSON which will quickly become unwieldy.

You might say that instead of encoding the overrides in JSON, the single override environment variable points to another config file, or maybe OTEL_CONFIG_FILE accepts a prioritized list of config source. For example, export OTEL_CONFIG_FILE=/path/to/base-config.yaml,/path/to/overrides.yaml, where /path/to/overrides.yaml is defined as:

tracer_provider:
  limits:
    attribute_value_length_limit: 20

This would yield the same result previously discussed.

This is simple to reason about for implementers, but arguably doesn't meet the spirit of the expectation of having the ability to override. The question is do users expect to have any sort of generic environment variable mechanism, or precise environment variable tools to target specific properties. I think prior art points to the latter expectation:

brunobat commented 7 months ago

Not sure what you mean about:

The question is do users expect to have any sort of generic environment variable mechanism, or precise environment variable tools to target specific properties

Can you please provide examples?

Mind that frameworks already have their own config systems. All of them use env. vars. File based (exclusive) overrides and yaml based env. vars. values are exotic things that I've never seen in the wild.

jack-berg commented 7 months ago

Generic meaning something like: export OTEL_CONFIG_FILE_OVERRIDES='{"tracer_provider":{"limits":{"attribute_value_length_limit": 20}}}'

Overriding everything is possible, though maybe not in a way a user would expect.

Precise meaning something like export OTEL_CONF_TRACER_PROVIDER__LIMITS__ATTRIBUTE_VALUE_LENGTH_LIMIT=20.

Where we have a precise mechanism for targeting / overriding a specific property.

brunobat commented 7 months ago

export OTEL_CONF_TRACER_PROVIDERLIMITSATTRIBUTE_VALUE_LENGTH_LIMIT=20

I would expect this one.

ocelotl commented 7 months ago

@brunobat @jack-berg so, in your experience, it seems like previuous configuration frameworks have used environment variables to override configuration files. We are aware of all the limitations this approach has. Using another YAML file to do the overriding does not have any of those problems. I don't think we should favor an approach that has problems over another one that does not just because the former is pretty much "what has always been done". We can implement a new, different better solution, I think our users would appreciate not having to deal with the limitations of environment variables and to be able to use all the capabilities YAML configuration provides (including object arrays).

brunobat commented 7 months ago

I think our users would appreciate not having to deal with the limitations of environment variables

That's one perspective. Some users will not like to have yet another fancy configuration to deal with.

Don't get me wrong, I'm all against "what has always been done". New, better ways must be implemented, however, that cannot happen by forgetting the past and FORCE everyone to use whatever we think is better now.

Backwards compatibility matters.

Let me add, if you think it is hard to merge things ourselves, imagine passing that burthen to the users, which is pretty much what we are talking here.

jeremydvoss commented 7 months ago

Added my opinion to the Python-specific discussion as well:

File configuration is a new feature not a requirement. We should not remove environment variable functionality, which is widely used in manual instrumentation and extremely required for auto-instrumentation simply to add another feature. Removing environment variable functionality would be a gigantic breaking change that would severely affect all Azure customers.

Assuming I understand the wording of the proposals correctly, I strongly recommend Proposal 1. This would keep all env var defaulting behavior in the sdk, but allow for environment variable substitution in the file configuration. This is the best of all words:

It seems the only supposed downside to Proposal 1 is that users would have to specify in the file how they want env vars to be used. But, that seems appropriate and almost inevitable to me given the scale of file configuration (multiple of every component). This approach is the most user friendly, clear, and customizable.

Proposal 2 seems like the worst of all words to me. It would break customers, require extensive development in each language, and leave a confusing and ill-defined ruleset for any existing and future env var configuration.

I am not sure I understand Proposal 3 completely. Is it basically that there would be a new and separate set of env vars just for file configuration overrides? Would this leave the existing env vars functionality completely intact?

Finally, what about exploring the option of providing the entire yaml/json config in the value of an env var like OTEL_CONFIGURATION or something as an option. Then literally everything could be codelessly overwritten. I've seen config options like that in cloud services/

tylerbenson commented 7 months ago

Caveat: This is a long discussion and I haven't read through everything, so I don't have all the context. I expect some of these ideas have already been discussed, so apologies for any overlap. I just wanted to present an idea and answer questions. I don't intend to join the debate.

I wanted to put forth an idea for a combination of features that I think could help simplify things and allow for future evolution based on user feedback (for example adding config file specific environment variables which is one of the options being discussed).

Proposal:

Note: this proposal does not cover what to do with standard environment variables that don't map well to the file config. For example OTEL_TRACES_EXPORTER likely won't make sense unless the exporter section is somehow missing which might cause the file config to get a default exporter config from the SDK. This is ok though as the responsibility for how to use environment variables is being given to the user based on how they construct their config file.