open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.65k stars 875 forks source link

[Terminology] Provide a term for a language-specific solution that adds otel to an application without the need of changing the code of the application itself. #4129

Open svrnm opened 2 weeks ago

svrnm commented 2 weeks ago

As a follow-up to https://github.com/open-telemetry/community/issues/2165 I want to give it another try to discuss and clarify the terminology around a language-specific solution that adds otel to an application without the need of changing the code of the application itself.

Goal

For the OpenTelemetry documentation we need a term that describes any solution that an end user can add to their application without changing application code to make that application emit telemetry otel-style, by adding the SDK, auxiliary pieces (exporters, sampler, resource detectors, language-specific config helpers, etc.) and instrumentation libraries to the application. This term will be the umbrella term in navigation and title for this page, so it includes existing solutions for .NET, Python, Go, Java, PHP and JavaScript and any future solutions that will be added. Some of those solutions add otel at runtime (like java agent, or ebpf go solution) but also at compile time (like the Spring Boot Starter or go instrgen).

This term should be easy to understand to an end-user and not be used for anything else. So the final definition should also contain examples & counter-examples how to use that term, for example this term should not be used to describe instrumentation libraries or the process of the opentelemetry operator to inject opentelemetry into an application, so it might be required to define some of those related terms as well.

Current Situation

https://github.com/open-telemetry/community/issues/2165 and others contain a lot of history to that, but I try to provide a summary: We have four terms used across the ecosystem: "Automatic Instrumentation", "Instrumentation", "Zero-Code Instrumentation", "(Instrumentation) Agent" and "Distro". There might be more but those are the ones I am aware of, all of them have their downsides and I try to provide a summary for each one of them below.

Automatic Instrumentation

This - in it's capitalized form - has been so far the commonly used term to describe what we are looking for. So on paper there is a difference between "Automatic Instrumentation" and "automatically instrumenting" something. We have plenty of examples where we use the lower case version when not referring to a language-specific solution to add otel to an application, even in the spec, e.g.:

https://github.com/open-telemetry/opentelemetry-specification/blob/ad987368ec9b967414f0d0b218b0c8f944a1f333/specification/logs/README.md?plain=1#L427 https://github.com/open-telemetry/opentelemetry-specification/blob/ad987368ec9b967414f0d0b218b0c8f944a1f333/specification/trace/exceptions.md?plain=1#L20

There are many more examples, but the two most problematic are when one talks about an instrumentation library or when one uses to describe a mechanism to accomplish automatic instrumentation:

Related to the last one, there is an official definition for automatic instrumentation in the spec already, but it talks about telemetry collection methods:

Refers to telemetry collection methods that do not require the end-user to modify application's source code. Methods vary by programming language, and examples include code manipulation (during compilation or at runtime), monkey patching, or running eBPF programs.

For me, this means that "code manipulation", "monkey patching" or "running ebpf programs" are "telemetry collection methods" and by that are what should be called "automatic instrumentation". This is not saying that "Automatic Instrumentation refers to language-specific solutions that add telemetry collection to an application without requiring the end user to modify source code."

Instrumentation

We host multiple language-specific solution to add otel to an application in repositories that we call opentelemetry-<language>-instrumentation which is highly confusing for end-users exploring our repositories, since they might expect to find instrumentation libraries in those repos (which they do for java, but most of the case they are hosted in "contrib").

Zero-Code Instrumentation

Following this discussion and this issue we decided to go with "Zero-Code Instrumentation" for the documentation to solve the problem stated above (the need for an umbrella term). The main driver behind this was that "automatic instrumentation" has the problems outlined above.

Interestingly (and unfortunately) I saw a few examples recently were it was used similarly to automatic instrumentation to describe something different, e.g. "This instrumentation library allows you to add opentelemetry to your library zero-code-instrumentation-style".

(Instrumentation) Agent

There is a long history of people objecting to use "Agent", I can dig up some history to that, if required. But a major blocker for it is that not all solutions are "agents" in the sense of an "APM Agent", e.g. instrgen or the Spring Boot Starter are no agents.

Distro

Python uses the term "distro" to describe their solution to instrument applications without code changes. Distro/Distribution is indeed a term that also will need to be defined more clearly, but this is out of scope.

Next Steps

This discussion is of high risk to turn into bike shedding without any proper outcome:

[!IMPORTANT] **We (the people writing documentation) and others (like the people doing presentations, blog posts, trainings or certifications) require that term and the longer we wait to not fix it we make this problem worth.

So, I kindly ask for the following:

So, without fixing a term, here is what I propose:

  1. We provide <term a> which is an umbrella term, that describes a language-specific solution to add all component needed for emitting telemetry to an application without the need of changing the code of the application itself. The components needed are at least the SDK, but auxiliary pieces (exporters, sampler, resource detectors, language-specific config helpers, etc.) and instrumentation libraries can be included as well.
  2. Such a solution <term a> may ask the end user to write configuration (like the spring boot starter) or other "code-like" additions, but the key piece is that the application code itself remains untouched.
  3. <term a> can not be used in other context, like instrumentation libraries, or to describe a mechanism that is used to accomplish the goal (like ebpf, byte code injection, etc.), or the process of injecting such a solution through a k8s operator (or other tools)
  4. There are ways to distinguish certain kinds of <term a>, e.g. "compile time" (spring boot starter, instrgen, code injection in general) and "runtime" (java agent, ebpf)
  5. If we fix <term a> with something that is used in the ecosystem already, we make sure that proper replacements can be provided.
  6. Another <term b> is required to describe telemetry collection methods that are used to build instrumentation libraries. This term should be clearly different from <term a>
theletterf commented 2 weeks ago

Thanks for the excellent overview, @svrnm !

I'd like to understand first what's the scenario in the OTel roadmap for all future instrumentation: Are all instrumentations going toward automatic or zero-code? Is that even possible for all?

If we're going toward zero-code as default

This scenario requires that all OpenTelemetry instrumentation is an automatic/autonomous/zero-code experience by default. If such is the direction OTel heading to, no special term would be required and we should treat the opposite case, that is, instrumentation that requires writing custom code as the exception, and come up with a term for it. I know we've recently gone away from that in the docs, so I guess that's not the future scenario?

If zero-code will always be a plus limited to runtimes

In that case, I'd rather go with automatic instrumentation, for two main reasons: 1) as a term, it's prevalent among vendors, and 2) it's quite clear semantically, even though the actual mechanism can differ. I see two issues with zero-code: 1) it sounds a bit like a marketing term, and 2) it's not always accurate (for example, one might need to at least edit a configuration file or a require statement somewhere, as is the case for PHP).

Just my two cents.

svrnm commented 1 week ago

@open-telemetry/technical-committee can you help and steer this discussion please. The goal is to have the terms defined in the glossary eventually such that we can use them in Docs and other community writings