Open trask opened 10 months ago
An open question for me is whether we want to rename the first group to be something like:
faas.coldstart.count
faas.error.count
faas.invocation.count
faas.timeout.count
the main(?) advantage of doing this is that it opens up the (implicit) namespaces for metric attributes on these metrics, e.g.
faas.coldstart.*
faas.error.*
faas.invocation.*
faas.timeout.*
An open question for me is whether we want to rename the first group to be something like:
faas.coldstart.count
faas.error.count
faas.invocation.count
faas.timeout.count
Aren't we already using the .count
suffix for UpDownCounters representing the current state of the system? E.g. jvm.cpu.count
, jvm.class.count
Aren't we already using the .count suffix for UpDownCounters representing the current state of the system?
Yes, but this is in line with the current advice for UpDownCounters. Counters don't follow that but IMO it means that naming of counters is inconsistent and lacks guidance.
Metric names follow the general form: {NAMESPACE}.{DESCRIPTIVE_NOUN}
Where {NAMESPACE}
hierarchical, delimited by .
.
And where {DESCRIPTIVE_NOUN}
is a noun describing what values the metric captures, relative to the namespace.
My hope is that we could have a well defined set of {DESCRIPTIVE_NOUN}
options that we can pick from for almost all situations. We have the start of that dictionary in the instrument naming section, but there are some notable exceptions. For example, duration
is omitted, count
is omitted (even tho its recommended for up down counters), and we say the following which makes counters the wild west:
Other instruments that do not fit the above descriptions may be named more freely. For example, system.paging.faults and system.network.packets. Units do not need to be specified in the names since they are included during instrument creation, but can be added if there is ambiguity.
Oh just saw #211 where this is already being discussed.
yeah, this issue is specifically to explore, as you said above, "the wild west of counter naming"
a couple more on their way in #163:
messaging.publish.messages
messaging.receive.messages
Any reason these shouldn't all just have a suffix *.count
?
I would think the counter argument would be that the prometheus compatibility document states that monotonic sums have the _total
suffix added, and obviously *_count_total
is weird. But we should be able to adjust that rule to say strip "count", and add "total".
This was closed by mistake by the stale bot. Re-opening
Related to #211 and #212
Here are existing (monotonic) counter metric names:
faas.coldstarts
faas.errors
faas.invocations
faas.timeouts
db.client.connection.timeouts
messaging.kafka.messages
messaging.kafka.requests.failed
messaging.kafka.isr.operations
messaging.kafka.leader.elections
messaging.kafka.leader.unclean-elections
hw.energy
hw.errors
hw.host.energy
hw.network.packets
hw.tape_drive.operations
hw.*.errors
process.context_switches
process.paging.faults
jvm.classes.loaded
jvm.classes.unloaded
system.paging.faults
system.paging.operations
system.disk.operations
system.network.dropped
system.network.packets
system.network.errors
.io
messaging.kafka.network.io
hw.gpu.io
hw.network.io
process.disk.io
process.network.io
system.disk.io
system.network.io
.time
process.cpu.time
jvm.cpu.time
system.cpu.time
.*_time
system.disk.io_time
system.disk.operation_time