open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
272 stars 174 forks source link

Should metric attributes be required to be namespaced? #51

Closed trask closed 1 year ago

trask commented 1 year ago

Summarizing reasons for not introducing this requirement:

Summarizing reasons for introducing this requirement:

I will update these summaries based on any comments made below.

bertysentry commented 1 year ago

Additional cons:

Additional pros:

Thank you for tracking this, @trask!

jsuereth commented 1 year ago

As an FYI - TC met and I'll be putting together a proposal (on this bug) on an "Option C".

The strawman is that all attributes are required to have namespaces. When an attribute namespace is the same as a "metric" namesapce, the attribute namespace will be dropped on the metric.

The idea here is we get the "best of both worlds" of reserved namespaces, global attribute registry, using the same "attribute bundle" for instrumentation across signals, but we also can produce metrics that look like the rest of the worlds metrics in addition to allowing tooling to understand a consistent set of rules when namespaces disappear.

I'll work on fleshing this out with concerns and details as a future comment. Once "Option C" is fleshed out, we'll be voting on a path forward.

trask commented 1 year ago

I'm curious if "Option C" might influence some of our metric naming choices.

For example, system.processes.count has attribute state, which would implicitly become system.processes.state.

It could be better(?) in this case to name the metric system.process.count, and then the attribute state would implicitly become system.process.state.

bertysentry commented 1 year ago

Having to choose for each attribute whether it should be namespaced or not will generate tons of discussions very difficult to arbitrate.

I propose an option D: Allow aliases for metric attributes.

Each attribute has a "principal" name, plus an array of aliases. When querying the metric, one could use the main name or any of the aliases.

First (naive) implementations could be to simply store each alias as a separate attribute.

Pros:

Cons:

joaopgrassi commented 1 year ago

Might be worth mentioning, I'm trying to move the existing metrics from markdown to yaml #73 and I bumped into what I think is a limitation in the tooling and/or violation of our recommendations

All names that are part of OpenTelemetry semantic conventions SHOULD be part of a namespace

Identical namespaces or names in all these areas MUST have identical meanings. For example the http.request.method span attribute name denotes exactly the same concept as the http.request.method metric attribute, has the same data type and the same set of possible values (in both cases it records the value of the HTTP protocol's request method as a string).

There's several attributes in the system metrics conventions today (e.g., state) that are used in several metrics and their values are not the same between metrics.

I started moving things into a YAML file using attribute_groups as done for the JVM metrics, but then the attributes get namespaced, such as system.cpu.state or system.memory.state, meaning it's not the same output as our current conventions.

Here's what I mean by using attribute_groups

- id: system.cpu
    prefix: system.cpu
    type: attribute_group
    brief: Describes System CPU metric attributes
    attributes:
      - id: state
        type:
          allow_custom_values: true
          members:
            - id: idle
              value: 'idle'
            - id: user
              value: 'user'
            - id: system
              value: 'system'
            - id: interrupt
              value: 'interrupt'
        brief: The state of the CPU
        examples: ["idle", "interrupt"]

 - id: system.memory
    prefix: system.memory
    type: attribute_group
    brief: Describes System Memory metric attributes
    attributes:
      - id: state
        type:
          allow_custom_values: true
          members:
            - id: used
              value: 'used'
            - id: free
              value: 'free'
            - id: cached
              value: 'cached'
        brief: The memory state
        examples: ["free", "cached"]

One might be able to get around the limitation by duplicating the attribute on each metric, changing it's enum values but that does not seem like a good idea to me. Essentially I can't share the attribute in different metrics without having it namespaced.

Oberon00 commented 1 year ago

The question is also: How should code generators name the constants for these? Usually I think they are based on the full name. And some metric attributes (like the ones shared with tracing) are indeed namespaced and unique. So would the code generators need to apply a heuristic like "does not contain dots" and then prefix with the group ID (which is otherwise only used for internal cross referencing and not usually exposed in generated code/markdon AFAIK)?

So it seems less a limitation of the tooling, but an ad-hoc concept accidentally introduced in current metric conventions for which it is unclear how it should be supported by tooling (if at all).

jsuereth commented 1 year ago

Option C: Metrics have implicit namespaces

This proposal requires all attributes in OpenTelemetry Semantic conventions to be namespaced, and available in a global registry. This provides many benefits, including:

Additionally, there is a desire from existing metric users for "small" or "short" attributes, as is the convention for metrics-only systems today. For example, rather than jvm.memory.state, metrics users would prefer an attribute in their backend to be just state

Metrics Implicit Namespace

To achieve this, Metrics will implicitly create a namespace based on their name. For example, a metric called process.cpu.utilization would implicitly have a namespace of process.cpu. Additionally we add the following rules:

Expected use cases

jack-berg commented 1 year ago

For the reasons mentioned here I still prefer requiring namespaces for all attributes.

I think the "option c" proposal above is a decent compromise, but that the benefits of attribute short names do not justify the complexity of the extra rules.

jsuereth commented 1 year ago

@jack-berg While I proposed Option C as a compromise, my own opinion aligns with yours.

joaopgrassi commented 1 year ago

To achieve this, Metrics will implicitly create a namespace based on their name. For example, a metric called process.cpu.utilization would implicitly have a namespace of process.cpu

Just if anyone wants to see, I have a draft PR to move the system metrics to YAML and before option C was proposed, it was the output I got with our current semconv tooling

https://github.com/dynatrace-oss-contrib/semantic-conventions/blob/feat/system-metrics-yaml/specification/metrics/semantic_conventions/system-metrics.md#metric-systemcputime

For ex: the system.cpu.* metrics have a state attribute. But system.memory.* and system.paging.* metrics also have that attribute.

The outcome is that the state attribute is created under those three namespaces (system.cpu.state, system.memory.state, system.paging.state)

Now a technicality I mentioned above: If we want Option C but also want the markdown rendering for the above attribute to be simply state, then I think we need changes in the tooling, because I couldn't find a way to do it. Maybe there is and I just don't know (which screams we need better docs for the semconv tool :D)

trask commented 1 year ago

as discussed in the specification meeting, I believe this issue should be decided by the @open-telemetry/technical-committee since there are strong pros and cons to these approaches (including Option C outlined by @jsuereth), and we need a decision (one way or the other) before we can feature-freeze the JVM runtime metrics

yurishkuro commented 1 year ago

I realize the following is not immediately actionable given the current state of OTEL. The fundamental issue we're dealing with here is that OTEL's semantic conventions bundle semantics with naming. Here's a contrive example of why this is bad: assume we want to capture events containing CPU measurements into an SQL table. Which table schema would you choose?

CREATE TABLE PROCESS_CPU (
  utilization INT NOT NULL,
  state       VARCHAR(50) NOT NULL,
  idle_time   INT NOT NULL
);

or

CREATE TABLE PROCESS_CPU (
  process_cpu_utilization INT NOT NULL,
  process_cpu_state       VARCHAR(50) NOT NULL,
  process_cpu_idle_time   INT NOT NULL
);

I think it's safe to say most engineers would pick the first option (the basic DRY principle; Go linter even has a similar anti-stutter rule). But then how do we distinguish if the state columns in this table and in a different table refer to the same thing or not? OTEL semantic conventions cannot help any more at this point, but the solution does exist - by capturing additional metadata about the columns in the table schema. I went into details of such approach in my talk last year https://www.shkuro.com/talks/2022-10-27-schema-first-application-telemetry/

bertysentry commented 1 year ago

I'm with @yurishkuro on this. I was suggesting to allow aliases for attributes names, and such aliases could be part of such attributes metadata.

In terms of implementation though, it's a bit tricky. Developers who are implementing OpenTelemetry to instrument their application and emit metrics would need to follow strictly the semantic conventions that specify such aliases.

For example, suppose the semantic conventions for system.cpu commend to include system.cpu.state AND its alias state. When emitting the system.cpu.utilization metric, the developer would need to make sure to include not just the "state" attribute, but the ["state", "system.cpu.state"] attribute.

tigrannajaryan commented 1 year ago

A TC vote was requested on this, however it is not clear to me what the option "require namespace" is. What namespace? Any namespace? Is it OK to place all attributes in namespace all.? What really is the requirement? That all attributes are spread into more than one namespace? How many? Should namespaces in some way match the metric name namespace?

Are we instead perhaps trying to say that a metric attribute considered in isolation from the metric should have a meaningful name, such that the interpretation of that meaning does not require additional knowledge of a context in which it is used (i.e. the metric name)? So, state does not match this requirement (state of what?). cpu.state probably matches the requirement since we know what a CPU is and can define what a state of a CPU is.

Let's look at another example. We currently have a cpu attribute in system.cpu.time metric. What does cpu attribute describe? It is supposed to be the integer number of the cpu. Arguably, probably a clearer name would be cpu_number. Is this good enough? Seems like a namespace is superfluous in this case. Should we still place this in system.cpu. namespace so that it becomes system.cpu.cpu_number?

My suggested definition is of course still ambiguous. Let's look at another example: a port. We know what a port is right? It's that number that is associated with a network socket (to be clearer, we would probably want to call this port_number instead of port). So, no other context needed? No, it turns out it can be a client port or a server port. So, server.port is namespaced enough? But what if we decide we want for all http connections to record 4 port numbers: the client, the server, and the incoming port on the proxy and outgoing port on the proxy. Should we use client.src.port, proxy.dest.port, proxy.src.port, server.dest.port? What's the framework for making a decision about what namespace (how deeply nested) to use?

I find that the definition of "require namespace" rule is lacking the necessary precision to apply it. It would be great if you can come up with a proper definition (or maybe there is one that I do not see, please point me to it). Without such precise definition I don't think that option is ready for the TC to vote on.

jack-berg commented 1 year ago

@tigrannajaryan from my perspective the question is merely whether metric attributes are exempt from the namespacing requirements we already impose on attributes elsewhere.

Technically, the attribute naming spec is stable and is quite clear on this already. It starts with:

This section applies to Resource, Span, Log, and Metric attribute names (also known as the "attribute keys"). For brevity within this section when we use the term "name" without an adjective it is implied to mean "attribute name".

Then follows with:

Names SHOULD follow these rules: Use namespacing to avoid name clashes.

Put simply, option A just reinforces what the spec already says. The other options aim to carve out some sort of exception for metrics. I don't think we need more precision with the attribute namespacing rule (at this moment in time) since trace and log semantic convention conversations have been able to proceed without it.

tigrannajaryan commented 1 year ago

Thanks @jack-berg

That's very useful.

I think we need clarify what we consider a clash, i.e. a clash with what? It appears that another way to think about "no exception for metrics" approach is to say that we want globally unique attribute names across all signals, such that no metric attribute can be the same as a span attribute or a log attribute unless the attribute has the exact same meaning. I would expect that we make this clarification in the spec should we decide that yes, we would like metric attributes to "require a namespace".

jack-berg commented 1 year ago

The TC has voted on this issue with results as follows:

Therefore, we should proceed with: "Option A: Metric attributes must have namespaces"

Closing this issue and we can separately proceed with the changes required to adopt (e.g. #3431, semantic-conventions#61)

bertysentry commented 1 year ago

Thank you for organizing the vote on this issue! Can we keep the discussion going on other options? @yurishkuro mentioned using metadata to convey this information. I suggested an option D, allowing aliases for attributes. But these were discarded in the discussion 😥

tigrannajaryan commented 1 year ago

Can we keep the discussion going on other options?

TC votes are for decision making. The decision is made to go with "Option A: Metric attributes must have namespaces".

jsuereth commented 1 year ago

@bertysentry I don't think selecting Option A negates the possibility of something like an Option C or D appearing in the future. However, the details of Option D are too open ended to consider voting. As you saw, Option C vs. Option A was a close call, but the fundamental agreement was that from a Semantic Conventions standpoint, metric attributes will have namespaces and unified descriptions/meaning between signals.

On a metric-backend-specific case, allowing aliases for well-known metric attributes is totally fine, but outside the scope of OpenTelmetry. Users are also welcome to shorten metric names via processors if it is their desire.

bertysentry commented 1 year ago

Thank you, @jsuereth, for the reply. I'm glad there is still room for discussion on improving this. IMO the required namespace in metric attributes is doomed to failure (for various reasons already detailed before). I'm concerned that too few implementations will follow these strict conventions, and that's why I truly hope enhancements will be considered, to ensure a wide adoption of these conventions. 🙏

Oberon00 commented 1 year ago

I think the main decision here was that they are conceptually namespaced and that's how we will describe them in semantic conventions. On this basis, any transmission optimizations can still be built.