open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
220 stars 141 forks source link

Analyze naming conventions for (monotonic) counter metrics #260

Open trask opened 10 months ago

trask commented 10 months ago

Related to #211 and #212

Here are existing (monotonic) counter metric names:

.io

.time

.*_time

trask commented 10 months ago

An open question for me is whether we want to rename the first group to be something like:

the main(?) advantage of doing this is that it opens up the (implicit) namespaces for metric attributes on these metrics, e.g.

mateuszrzeszutek commented 10 months ago

An open question for me is whether we want to rename the first group to be something like:

  • faas.coldstart.count
  • faas.error.count
  • faas.invocation.count
  • faas.timeout.count

Aren't we already using the .count suffix for UpDownCounters representing the current state of the system? E.g. jvm.cpu.count, jvm.class.count

jack-berg commented 10 months ago

Aren't we already using the .count suffix for UpDownCounters representing the current state of the system?

Yes, but this is in line with the current advice for UpDownCounters. Counters don't follow that but IMO it means that naming of counters is inconsistent and lacks guidance.

Metric names follow the general form: {NAMESPACE}.{DESCRIPTIVE_NOUN}

Where {NAMESPACE} hierarchical, delimited by ..

And where {DESCRIPTIVE_NOUN} is a noun describing what values the metric captures, relative to the namespace.

My hope is that we could have a well defined set of {DESCRIPTIVE_NOUN} options that we can pick from for almost all situations. We have the start of that dictionary in the instrument naming section, but there are some notable exceptions. For example, duration is omitted, count is omitted (even tho its recommended for up down counters), and we say the following which makes counters the wild west:

Other instruments that do not fit the above descriptions may be named more freely. For example, system.paging.faults and system.network.packets. Units do not need to be specified in the names since they are included during instrument creation, but can be added if there is ambiguity.

jack-berg commented 10 months ago

Oh just saw #211 where this is already being discussed.

trask commented 10 months ago

yeah, this issue is specifically to explore, as you said above, "the wild west of counter naming"

trask commented 10 months ago

a couple more on their way in #163:

jack-berg commented 10 months ago

Any reason these shouldn't all just have a suffix *.count?

I would think the counter argument would be that the prometheus compatibility document states that monotonic sums have the _total suffix added, and obviously *_count_total is weird. But we should be able to adjust that rule to say strip "count", and add "total".

joaopgrassi commented 4 months ago

This was closed by mistake by the stale bot. Re-opening