Should file configuration merge environment variable configuration?

jack-berg commented 1 year ago

The conversation about whether file configuration should completely ignore the sdk environment variable scheme came up in #3744, but that PR doesn't actually contain any language related to this.

The original file configuration OTEP stated:

Interpret the configuration model and return SDK TracerProvider, MeterProvider, LoggerProvider which strictly reflect the configuration object's details and ignores the opentelemetry environment variable configuration scheme.

As mentioned here, file configuration doesn't actually contain language describing this behavior. It was included originally included in #3437 but was lost in the PR review shuffle - accidentally, not in response to feedback.

@tedsuo argues in favor file configuration respecting env vars with:

The common expectation among developers is that env vars will automatically overwrite config parameters. If we do it the other way, I am concerned that until the end of time we will have a steady stream of users lodging issues about this and then becoming very frustrated when we explain that they need to modify the config file to use an env var.

The reason that these users will be upset is that they are in a situation where having to define the env vars in the config file is a non-starter. Their use case is that for some reason they either can't get at or are not allowed to modify the config file template, but they really need to override a parameter.

For reference, this is an issue that has come up on many OSS projects I have been involved with where it is common to have both operators and application developers wanting to configure the same thing. Often it's some kind of emergency situation where the person with the rights to change the config file is unavailable.

I should note that of course the opposite situation could be true, where for some reason you want to disable an env var but don't have access to it. But it's probably easier to give users the ability to disable an env var via the config file than it is to give them the ability to disable a config parameter via an env var.

@MrAlias argues in favor of ignoring env vars with:

To be fair, from experience, you're going to get people complaining either way. If you choose environment variable priority over a configuration users will complain that their deployment was altered and failed when an environment variable was set that took precedence.

However, if you make environment variables take precedence you will also need to make some pretty sever and subjective choices on how they are mapped to a config. Do the BSP environment variables apply to all batch span processors or just one? Is the sampler environment variable use for all tracer providers in the config, even ones that specify alternate samplers? Should propagators be merged or overridden? If an exporter is defined by environment, does that stop the console logging exporter used for debugging as well as all the other exporters defined in configuration?

Ultimately, I think the current changes are going to be the most appropriate. They allow users to make their own choice in precedence without making subjective choices for them on how to map things. If a user want environment variables to take precedence, all they need to do is use the OTel environment keys in the related parts of their configuration. In doing so they will answer each of the above questions their own way.

@trask supports the feeling of users expecting env vars to override file configuration, but also says merging configuration from multiple sources is hard:

This is my feeling as well, especially for things like:

OTEL_SDK_DISABLED OTEL_RESOURCE_ATTRIBUTES OTEL_SERVICE_NAME OTEL_LOG_LEVEL OTEL_PROPAGATORS OTEL_TRACES_SAMPLER_ARG I totally understand the nightmare that is merging configuration from multiple sources though.

I wonder if we would have created many of the other env vars (e.g. OTELBSP, OTELBLRP) if we had configuration file support from the beginning? And if so, maybe we can deprecate those other env vars in favor of configuration file?

This topic came up several times during the lengthly review of the file configuration OTEP. Below are links to a number of and relevant points:

https://github.com/open-telemetry/oteps/pull/225#discussion_r1116269308

Layering of config as described below would make it more difficult to reason about what my program is actually being run with.

Additionally, perhaps give them a helper config file that uses env var substitution in the right places so that they can migrate easily (and still get a warning until they get rid of env variables and move everything to the config file).

https://github.com/open-telemetry/oteps/pull/225#discussion_r1119068865

What about 'Solution 3: fail when both environment configuration and file configuration are present'?

I think we could log a warning when we detect this, but failing is too strong. Consider the implications if a user is operating in an environment they don't fully control (i.e. where an ops team configures environment variables by default which they extend / layer on top of).

https://github.com/open-telemetry/oteps/pull/225#discussion_r1142380977

I have a (rather strong) opinion that setting set via env vars has higher priority (takes precedence) over a setting set via configiuration file and it should be marked as a goal.

Any scheme where environment variables have priority over a config file will require some sort of standard mapping between the environment variable schema and file config scheme. IMO, its impossible to define such a mapping which is intuitive in all cases, so better not to try.

Nothing is forcing users to use file based config - its opt in. If they do opt in, they're opting into the documented behavior in which the config file represents the source of truth for configuration. If they wish to customize the experience with additional layers / overrides, they have a couple of tools:

They can use the fact the Configure(config) API accepts a config model as an argument, and provide their own customizations to the model after the initial parsing of the file via Parse(file). An example of such a customization would be to interpret environment variables and apply them to the model in a way they decide makes sense.

They can use environment variable substitution to reference environment variables directly in a configuration file.

Update 3/15/2024

The current state of this issue is:

The configuration working group proposed and executed a process to move forward after several months of failing to reach consensus
As a part of that process, the configuration maintainers reviewed proposals and have recommended moving forward with option 2.b as described in this comment
As a part of that process, people who disagree with the recommendation can / should escalate to a TC decision

Update 3/28/2024

Please see this comment updating the status of the issue:

Per @tedsuo’s request, we discussed this issue in the 3/24/27 TC meeting and have made a decision: Generally, we will follow @trask's comment, proceeding with this PR with a few changes:

Rename OTEL_CONFIG_FILE to OTEL_EXPERIMENTAL_CONFIG_FILE, reflecting the fact that the semantics around how the value of env var are subject to breaking changes as the file configuration spec and schema continue to evolve.

Ensure that env vars which don’t interop with file config are deprecated when file config is ready for stabilization, reflecting that we do not want to recommend multiple competing configuration stories. This could be ensured via an explicit note in the markdown, or a blocking issue - both achieve the same effect. https://github.com/open-telemetry/opentelemetry-specification/issues/3967

Ensure that file config has an interop where platforms (i.e. Azure functions, otel operator, etc) contribute to config. We should proceed with #3948 without being prescriptive about how that mechanism works. In the TC meeting, 4 distinct solutions were discussed which had different tradeoffs and limitations. It is clear that we still need to learn more about the requirements and constraints of this use case and let the findings inform the solution. The config working group should prioritize this discussion, but an answer shouldn’t block this PR. We should open a new issue to track the requirements and discuss solutions, and ensure that we treat that issue as blocking for any sort of stabilization effort (although it should ideally be solved much sooner). https://github.com/open-telemetry/opentelemetry-specification/issues/3966

tedsuo commented 8 months ago

edit -- my reference to merged config files is in the sense that tooling exists to merge yaml config together, I presume that something to accomplish this will be part of the tool chain for managed environments

It would be good to hear from platform providers as to how feasible it is to insert files and tooling into user environments. I suspect it is too difficult or restrictive for some environments, but if that solves the problem, I would agree that it would be a fine solution.

Either way, as long as we know that this feature is technically compatible with the design being proposed, I don't think we need to hold up the config SIG while we investigate and settle on a solution for this problem.

As to choosing the env vars, if the explicit goal is platform support, that provides a clear way to make a decision, so I think it could avoid most of the bikesheding.

ocelotl commented 8 months ago

More details here

That's option 2d by the way. And overriding YAML files with another YAML file is optional, not mandatory to implement it.

tedsuo commented 8 months ago

Thinking more on this. The issue with supporting the old env vars is that how they should be applied is ambiguous. Except, for env vars that set resources it is actually unambiguous. The resource section is always at the same path in the config file, correct? So there's no potential confusion as to how they would be applied.

If that is the case, there would be no reason why we could not continue to support all of the env vars that set resources. This also would provide a clear definition as to which env vars are supported, so there's no confusion there either. And allowing platforms to set resources via env vars makes a lot of sense: developers do not necessarily know which resources a platform has available to it. Allowing operators to add additional resources without needing to request that application developers push a new config file is also very helpful.

@jack-berg @codeboten @lmolkova what do you think of this proposal? Resource env vars are supported directly, but everything else requires a config variable.

@ocelotl thank you for your investigation! But, I think the goal is to support configuration via config file, not just pipeline development. We do not want to keep adding additional env vars, and in general it is not clear how env vars would be applied to complex pipelines. That's part of the reason why we want to move to file-based configuration.

trask commented 8 months ago

I feel like creating a mix of 'these variables work all the time' and 'these variables work only some of the time' is worse, though?

An option could be to deprecate the env vars that don't work all of the time.

jack-berg commented 8 months ago

And allowing platforms to set resources via env vars makes a lot of sense: developers do not necessarily know which resources a platform has available to it.

Platforms should set platform specific resource attributes via resource detectors. OTEL_RESOURCE_ATTRIBUTES is the wrong tool for the job.

lmolkova commented 8 months ago

And allowing platforms to set resources via env vars makes a lot of sense: developers do not necessarily know which resources a platform has available to it.

Platforms should set platform specific resource attributes via resource detectors. OTEL_RESOURCE_ATTRIBUTES is the wrong tool for the job.

It would mean that every platform would need to implement a detector in every language and ship a package containing this detector. I suspect they won't, at least smaller ones (from Azure perspective, I doubt that we will be able do it for every lang). It would make user experience worse - there won't be a consistent way for cloud providers/etc, no consistent way for users across languages, and there would be more packages, version conflicts, dependencies, etc.

If we expect everyone who wants to integrate with otel do something, we should provide them a convenience. Existing resource env vars do and are already used for this purpose, so why is it a wrong tool?

lmolkova commented 8 months ago

Thinking more on this. The issue with supporting the old env vars is that how they should be applied is ambiguous. Except, for env vars that set resources it is actually unambiguous. The resource section is always at the same path in the config file, correct? So there's no potential confusion as to how they would be applied.

I do support this proposal. I still think we might need a few extra env vars for exporter and propagators

AWS lambda and xray wants to give priority to xray propagator
I assume providers might want to provide default exporter choice (otlp and the endpoint)

We can either investigate if more env vars should interop with the config or start with resource ones and leave a room for more env vars to be added in the future to the 'interop' list if we'll find them necessary.

I also support @trask proposal to deprecate some env vars. I suggest starting with those which don't have a good interop story with the config.

tedsuo commented 8 months ago

Platforms should set platform specific resource attributes via resource detectors. OTEL_RESOURCE_ATTRIBUTES is the wrong tool for the job.

Resource detectors are for handling platforms which are not opentelemetry-aware, as we don't have any other choice. It is much better for platforms to set their resources directly in a language-independent manner, free of maintenance overhead and version mismatches.

jack-berg commented 8 months ago

Existing resource env vars do and are already used for this purpose, so why is it a wrong tool?

I regret bringing this up because I think its besides the point of this conversation. With so much debate on this particular issue, its especially important to not get sidetracked.

The ideas that have been discussed most recently in this thread are NOT compatible with the configuration working group's recommendation. The main idea seems to be to have the existing env vars override the contents of a config file where a clear mapping is possible, and to deprecate the existing env vars which do not have a clear mapping. If this were the direction, we would not want to have starter templates which reference all the existing env vars and their defaults using the substitution syntax, since doing so would muddy the waters even more. And without this requirement, we probably would think twice about introducing the env var substitution default syntax. Additionally, the config working group recommendation only needs to define the behavior of ignoring existing env vars when OTEL_CONFIG_FILE is specified, but this idea would need to define whether or not existing env vars take precedence when parse and / orcreate are called, or only when OTEL_CONFIG_FILE is specified.

This is to say that the recent ideas in the thread are an entirely new proposal, rather than a small modification to the recommendation. As I've mentioned several times, I'm unsure how to move on from here. There were 6 proposals considered by the SIG. This set of ideas represents a 7th. It would be incorrect for us to go and change the recommendation after concluding the process, and furthermore I personally wouldn't want to. The courses to take from here appear to be:

Accept the recommendation and merge #3948
Discuss / refine additional proposals more, potentially looping in the spec SIG. Escalate to a TC decision: Summarize the issue, the proposals, the new proposals, and what the working group got wrong with its recommendation.

yurishkuro commented 8 months ago

@jack-berg I think the issue with the evaluation document is that it did not provide an evaluation framework, and thus the recommendation is difficult to justify (see my rant on "pros & cons"). There is rarely an option that beats all other options on all decision dimensions, but when those criteria are not even defined clearly then the comparison will always look not convincing. On the other hand, if the decision was documented with a decision framework, then adding proposal #7 to it would be relatively straightforward and a new (or same) recommendation can be issued.

One very useful decision framework is Traffic Lights (based on quick googling this blog post does a decent job explaining it). It's not difficult to transform the existing pros & cons already collected in the doc and in this thread into a traffic lights matrix. Some of the criteria that I would consider important are:

now much extra work on maintainers a proposal creates
if a given proposal is chosen, whether it prevents other proposals from being implemented

The important aspect of traffic lights method is to make sure that people's concerns are heard and reflected in the evaluation. For example, @lmolkova is worried about existing documentation and backwards compatibility - so we should add that to the matrix and compare different options on this axis (e.g. 2b would be yellow, Ted's proposal of resource-only env vars would also be yellow).

jack-berg commented 8 months ago

The document summarized the options, but was paired with a comment which did describe an evaluation framework. It was not a traffic lights framework, but included evaluation criteria with an emphasis on the high priority criteria, a defense of the recommendation including explanation of weaknesses, and summary of why the others were rejected.

Maybe otel could / should adopt a decision making framework like traffic lights, but no such precedent exists. One problem I can think without having used traffic lights in anger is disagreement over the color of the light, and over the relative significance / weight of each of the evaluation criteria. In the process for making this recommendation, the configuration-maintainers voted repeatedly until a winner was selected - similar to what the TC might do. How do you describe each individual's process for deciding how to vote in a summary of a recommendation?

There's an additional meta-issue with this: We described a process to collect and summarize all the different points of view, and make a decision. We advertised that process and executed it in good faith. It was a significant investment in time. We drew a line in the sand, and the process wasn't perfect but it was pretty good and organized compared to what I've encountered elsewhere in this project. Re-initializing the conversation sends a signal that processes like this have no teeth, and hurts decision making since there's no penalty for not participating (i.e. the conversation is never really over). (Note: This is not targeted at anyone, or even at this particular issue. I've noticed this with a number of issues over the years and it seems relevant now.)

lmolkova commented 8 months ago

I find the process you outline here https://github.com/open-telemetry/opentelemetry-specification/issues/3752#issuecomment-1995582317 to be awesome.

I agree with @yurishkuro that there could be more evaluation criterias. I'd add something around these lines:

existing experience with env vars should not become worse. It should not break existing integrations. (you can read it as backward compatibility or user experience, both apply)
we found a reasonable set of scenarios and prototyped them and here're the results

I think user experience is more important than "how much extra work on maintainers a proposal creates", but both have a good place in the evaluation criteria list.

If we had something like this as prereqs for the significant changes (does not degrade existing features, prototyped and evaluated, not too complex), there would be less last-minute feedback.

jackshirazi commented 8 months ago

If we had something like this as prereqs for the significant changes (does not degrade existing features, prototyped and evaluated, not too complex), there would be less last-minute feedback.

The equivalent of this was made clear in the discussions, the higher support cost from this and equivalent proposals was highlighted in the discussion agenda. The TC decided to not prioritize that sufficiently, we have to accept that. I thought my subsequent proposal to make the default support env vars using a merge of yaml was consistent with the chosen proposal, but @jack-berg points out it isn't because the proposal wording includes templates for these. We have to accept that too, (though I think this is a good compromise and we should be a little flexible about process vs best outcomes). So it's clear that to add this we need to escalate for a TC decision. For me, I feel like the people who would decide this have viewed the points here in the discussion and are not supportive of adding these defaults, so I don't see the point in such an escalation, though I'll happily support anyone who decides the effort is worthwhile

lmolkova commented 8 months ago

I want to emphasize, if a new feature degrades/breaks existing stable experience, it should not happen. No amount of "keep things simple" or "decision has already been made" can justify breaking/degrading user experience.

tedsuo commented 8 months ago

@jack-berg having been on both sides of this process multiple times, I agree that it feels broken when a SIG ends up having to rehash everything when a proposal is brought to the community, and essentially having the entire debate over again so that the proposal can be approved.

Your (very helpful) design breakdown does a good job of listing the criteria that the design proposals are evaluated against. Because of the work done by you and the SIG, I actually think the debate happening now is helpful, but could be structured better. Let me explain why.

I've been involved in many of the major design decisions in Otel (tracing, context propagation, error handling, etc, etc). For major design decisions, it is often the case that when the proposals go public, requirements that the designers miss are brought in by community members who were not part of the internal design process. This is normal; ensuring that our designs meet all requirements is one of the reasons we have a public review process. OTel must work across many languages, runtimes, and platforms, and it must be careful about breaking compatibility. Metrics is a major example of a design that took three complete rewrites before meeting all requirements. That was unfortunate, but if we had refused to honor those late-breaking requirements, our metrics solution would have been a failure. Honestly, with something as major as a new configuration model, requirements and feedback from the wider community should be expected. Especially if the design proposes a perceived break in compatibility!

Anyways, my point is that I don't actually think that the debate we are having here is unnecessary. Right now a new requirement is being proposed – namely, that our definition of compatibility should be stricter than the definition that was used to drive the current design. Compatibility is very important, it's reasonable that we would spend more time gathering real-world examples and finding clarity on what we actually need here.

What is perhaps making this conversation difficult is that we are mixing requirement gathering with designing. Proposed requirements are getting glued together with particular solutions. That's crazy making. I agree with @yurishkuro's comments, and I recommend that we back off from talking about solutions for a bit. Let's first get agreement on what our compatibility requirements actually are. Once we actually agree on requirements, I suspect that the design solution will be fairly obvious.

If we want to elevate this decision to the TC, that's fine. But we need the TC to decide on the requirements, not the solution. Let's spend this next week getting our requirements gathering into a clean document, so that we can actually see what we are talking about. For debated requirements, we can record the different opinions on what the requirement actually is, along with real world examples. No mention of solutions until we finish this work. Perhaps the result of this process will improve how we make proposals in the future.

tigrannajaryan commented 8 months ago

Let's first get agreement on what our compatibility requirements actually are.

I am going to look into this and if necessary will take to the TC for clarification. I see confusion and variety in opinions, which need to be clarified regardless of what we decide for config. I will post back when I have an update.

pyohannes commented 8 months ago

I think the approach proposed by the workgroup is a reasonable choice. It brings a welcomed conceptual simplification, and sets us up well for things like remote SDK configuration.

Below I collected some points about features that the proposed solution will not support, as compared to our current solution. I don't see any of these as a blocker, but I want to list them here as part of this discussion, as it seems some people consider them as "implicit requirements" in the context of backward compatibility.

It's not possible anymore to override implicit defaults.

The loss of this ability surprised some users to whom I talked about the proposed configuration approach. Currently, in .NET for example, users can add exporters with default values, which then are overwritten by environment variables. For example, .AddOtlpExporter() is called on the tracer provider, and then OTEL_EXPORTER_OTLP_ENDPOINT is set. As far as I can see, this is a use case that will not be supported by the proposed model: all defaults that should be overwritten must be specified explicitly. While I don't think this is necessarily a bad change, it is a behavioral change that will surprise some users.
I'm not sure if the template approach can cover all currently supported environment variables.

Is it possible to have a template that honors all existing environment variables? Or will this be a best-effort approach? It seems to me that environment variables using key-value pairs and lists cannot be used with this approach. I'm also not sure how variables like OTEL_TRACES_EXPORTER can be supported.
We don't have guaranteed consistency anymore in environment variable overrides.

Although we'll provide templates, nothing will stop users from using ${OTLP_ENDPOINT} in one place, and ${OTEL_OTLP_ENDPOINT} in another. The consistent set of environment variables we have now is hard to maintain, however, it's a nice-to-have from a user's point of view: it gives a consistent experience across languages, and it avoids unpleasant surprises.

jack-berg commented 8 months ago

The loss of this ability surprised some users to whom I talked about the proposed configuration approach. Currently, in .NET for example, users can add exporters with default values, which then are overwritten by environment variables. For example, .AddOtlpExporter() is called on the tracer provider, and then OTEL_EXPORTER_OTLP_ENDPOINT is set. As far as I can see, this is a use case that will not be supported by the proposed model: all defaults that should be overwritten must be specified explicitly.

I'm not sure how this maps to the normal .NET programmatic config process. I know in .NET the pattern involves a combination of programmatic config with elements that implicitly read from env vars. File config implies more of a one-liner config process, where the user calls something like OpenTelemetrySdk.initialize() (or equivalent) which detects that OTEL_CONFIG_FILE is set, parses it, creates SDK components from the model, and returns those SDK components to the caller. I'm sure OpenTelemetry .NET will find a way to balance the API so that a user can start with a config file and optionally layer on additional config programmatically, but its really a different paradigm. Where the current env var based config almost necessitates some programmatic config because so many options are missing, the file config model aims to be a near exhaustive representation of what can be done programmatically. Maybe some users will still want to combine programmatic config with file config, but that's kind of missing the point.

Is it possible to have a template that honors all existing environment variables?

No, it would be a best effort approach. #3948 represents the recommendation, and has an associated PR in opentelemetry-configuration: https://github.com/open-telemetry/opentelemetry-configuration/pull/76/files. See the proposed starter template comments for a list of env vars which don't map well and would be ignored.

We don't have guaranteed consistency anymore in environment variable overrides.

Yes, this is correct.

jack-berg commented 8 months ago

A point I haven't heard made yet with respect to compatibility and the env vars:

The goal of the env var spec is to standardize names of env vars where there is commonality between implementations. SDKs are not required to implement the env vars, and how they are implemented is left intentionally open ended:

The goal of this specification is to unify the environment variable names between different OpenTelemetry implementations.

Implementations MAY choose to allow configuration via the environment variables in this specification, but are not required to.

Environment variables MAY be handled (implemented) directly by a component, in the SDK, or in a separate component (e.g. environment-based autoconfiguration component).

This certainly leaves room for the introduction of a more prescriptive (and possibly net-new) component for handling file based configuration. I'm not sure how to read the intentionally loose language and conclude that we are restricted from introducing something new, quite different, and opinionated.

lmolkova commented 8 months ago

Environment variables are marked as stable. SDKs that implement them (virtually all) are mostly marked stable. SDKs can't stop supporting env vars without major version update.

Users that set env vars should be able to keep setting them and if we deprecate some, we'd need to provide at least some back-compat support for them anyway.

Please let's stop prioritizing new non-existent features over existing and popular ones.

jack-berg commented 8 months ago

SDKs can't stop supporting env vars without major version update.

I'm not advocating for stopping supporting env vars. Just providing an alternative door (door b). I see no reason to not support door a indefinitely - at least in opentelemetry-java where we strongly oppose revving the major version.

Please let's stop prioritizing new non-existent features over existing and popular ones.

Popular due to lack of alternative. The single most common type of issue I respond to in opentelemetry-java is asking for things not supported by the env vars. My response is always that new env vars need to be added to the spec, but that there is a moratorium in place making it difficult to add new ones. Just look how many times #2891 has been linked to. The user story is painful when env vars don't exist to express what you want, which occurs quite often. The single most popular issue (61 up votes at time of writing) in java instrumentation is about a lack of expressiveness in the env var syntax for describing non-trivial sampling situations. We wrote a dedicated view file config tool in opentelemetry-java to stop the bleeding and provide desperately needed configuration of things like explicit bucket boundaries, reducing cardinality, and dropping unneeded metrics. Based on the number of times I refer people to it answering issues, its quite popular.

If file config turns out to not be popular / useful, then people won't set OTEL_CONFIG_FILE and there's no problem with conflicting env vars to worry about 😁.

lmolkova commented 8 months ago

If we stabilized something imperfect in the past, we can't just say we do something else now. We have to find a way to update gracefully and keep all the good things that we had.

So I'm suggesting to keep both doors open at the same time in https://github.com/open-telemetry/opentelemetry-specification/pull/3948#issuecomment-2015572791.

We can and should review existing env vars and fix/deprecate some of them. In the current proposal there is no attempt to fix env vars, there is an implicit attempt to drop them eventually (or make sure nobody uses them) in ungraceful manner.

trask commented 7 months ago

Without taking a position on whether the existing env vars should or should not interop:

I think we should deprecate any env vars that do not interop with yaml config.

Why?

I think OpenTelemetry should have a clear recommendation for users on how to configure SDK + Instrumentation.

We know that yaml config is required to support some popular user requests, namely metric views and attribute-based sampling (the 4th most upvoted issue across all of OpenTelemetry).

So, if our recommendation is to "start with env vars", then we know that we are steering a lot of users down a one-way path.

I think it will be a much better user experience for everyone to start directly with yaml and not need to rewind and go down a different path later.

SDKs of course will have to support the deprecated env vars (without yaml interop) at least until their next major bump.

But by deprecating the env vars that do not interop with yaml config, we give a clear signal to users about the path they should take when onboarding to OpenTelemetry.

jack-berg commented 7 months ago

I think we should deprecate any env vars that do not interop with yaml config.

I'm not opposed to that, but we should consider the timing:

File config is still a very experimental idea. This is in large part because how contentious the PRs have been (note that these PRs were for the most part just restating things that had already been approved in the original otep):

https://github.com/open-telemetry/oteps/pull/225 - 1.5 months to merge
https://github.com/open-telemetry/opentelemetry-specification/pull/3360 - 9 days to merge
https://github.com/open-telemetry/opentelemetry-specification/pull/3437 - 6 months to merge
https://github.com/open-telemetry/opentelemetry-specification/pull/3744 - 2 months to merge
https://github.com/open-telemetry/opentelemetry-specification/pull/3802 - 1 month to merge
https://github.com/open-telemetry/opentelemetry-specification/issues/3752 - 3 months and ongoing

There's still a lot of work to be done, especially:

Building of prototypes in a variety of languages to prove concepts
Build out the configuration model schema
Figure out how stability guarantees work with the schema. I.e. what types of things can be changed in patch, minor, major releases.

Based on the history of getting things done for file config, and of getting similar schemas like opentelemetry-proto from experimental to stable, I'm disappointed to say that I don't see a stable file config spec as realistic in the short / medium term future.

Deprecating env vars significantly before file config is set to be stable sends a bad signal to the user: env vars are stable but deprecated, but the replacement is experimental without a target date for stability. I'd like to see env vars and file config coexist without deprecating env vars until there's a realistic prospect of marking file config stable.

tigrannajaryan commented 7 months ago

+1 to eventually deprecating some or all old env vars, ideally to have only one way of configuring the SDK. This of course can only happen sometime after the new way of configuring has a stable spec and is widely implemented by SDKs.

jack-berg commented 7 months ago

Hi All - Please see this comment updating the status of the issue:

Per @tedsuo’s request, we discussed this issue in the 3/24/27 TC meeting and have made a decision: Generally, we will follow @trask's comment, proceeding with this PR with a few changes:

Rename OTEL_CONFIG_FILE to OTEL_EXPERIMENTAL_CONFIG_FILE, reflecting the fact that the semantics around how the value of env var are subject to breaking changes as the file configuration spec and schema continue to evolve.

Ensure that env vars which don’t interop with file config are deprecated when file config is ready for stabilization, reflecting that we do not want to recommend multiple competing configuration stories. This could be ensured via an explicit note in the markdown, or a blocking issue - both achieve the same effect. https://github.com/open-telemetry/opentelemetry-specification/issues/3967

Ensure that file config has an interop where platforms (i.e. Azure functions, otel operator, etc) contribute to config. We should proceed with #3948 without being prescriptive about how that mechanism works. In the TC meeting, 4 distinct solutions were discussed which had different tradeoffs and limitations. It is clear that we still need to learn more about the requirements and constraints of this use case and let the findings inform the solution. The config working group should prioritize this discussion, but an answer shouldn’t block this PR. We should open a new issue to track the requirements and discuss solutions, and ensure that we treat that issue as blocking for any sort of stabilization effort (although it should ideally be solved much sooner). https://github.com/open-telemetry/opentelemetry-specification/issues/3966

open-telemetry / opentelemetry-specification

Should file configuration merge environment variable configuration? #3752