open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
250 stars 162 forks source link

Proposal: don't pluralize metric namespaces #212

Closed trask closed 11 months ago

trask commented 1 year ago

There are two reasons behind this proposal:

This would affect these metrics names:

It may also affect these metrics if we decide to add .count (or .usage) to them:

jsuereth commented 1 year ago

I think a year ago we decided to move this direction. See the updated pluralization rules in the spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/README.md#use-count-instead-of-pluralization

We should just clean up the specification of ambiguity of this topic.

tigrannajaryan commented 1 year ago

Can you please list all impacted metric names? Or maybe if it is easier just prepare a draft PR with all name changes. I think seeing all names would help better understand the impact.

trask commented 1 year ago

@tigrannajaryan I've closed the initial PR which expanded the scope to both metric namespaces and metric names, and opened a fresh PR that limits the scope to this issue only, which applies only to metric namespaces (and not metric names).

this issue affects the following metric namespaces:

dmitryax commented 1 year ago

Renaming system.processes.count (Total number of processes in each state) to system.process.count sounds ok to me, but renaming system.processes.created (Total number of created processes) to system.process.created makes the name less descriptive and a bit confusing IMO. Maybe we should consider a different name if we avoid pluralized namespaces.

I'd also suggest bringing metrics currently emitted by Collector receivers but not defined in the semantic conventions. There must be plenty of them with pluralized namespaces. Maybe they can influence this decision. I can do that exercise if it makes sense

trask commented 1 year ago

I'd also suggest bringing metrics currently emitted by Collector receivers but not defined in the semantic conventions. There must be plenty of them with pluralized namespaces. Maybe they can influence this decision. I can do that exercise if it makes sense

@dmitryax yes, I think that would be great!

dmitryax commented 1 year ago

I checked the collector receivers. Most of the exposed metrics don't have pluralized namespaces. But there are still a few of them with pluralized namespaces. I'll put all of them here with their description for the full context:

Kubelet stats receiver:
- `container.pids.count`: Number of pids in the container's cgroup
- `container.pids.limit`: Maximum number of pids in the container's cgroup

Elasticsearch receiver:
- `elasticsearch.node.operations.current`: Number of query operations currently running
- `elasticsearch.node.operations.completed`: The number of operations completed by a node.
- `elasticsearch.node.operations.time`: Time spent on operations by a node.
- `elasticsearch.node.operations.get.completed`: The number of hits and misses resulting from GET operations.
- `elasticsearch.node.operations.get.time`: The time spent on hits and misses resulting from GET operations.
- `elasticsearch.node.shards.size`: The size of the shards assigned to this node.
- `elasticsearch.node.shards.data_set.size`: Total data set size of all shards assigned to the node. This includes the size of shards not stored fully on the node, such as the cache for partially mounted indices.
- `elasticsearch.node.shards.reserved.size`: A prediction of how much larger the shard stores on this node will eventually grow due to ongoing peer recoveries, restoring snapshots, and similar activities. A value of -1 indicates that this is not available.
- `elasticsearch.node.thread_pool.tasks.queued`: The number of queued tasks in the thread pool.
- `elasticsearch.node.thread_pool.tasks.finished`: The number of tasks finished by the thread pool.
- `jvm.classes.loaded`: The number of loaded classes
- `jvm.gc.collections.count`: The total number of garbage collections that have occurred
- `jvm.gc.collections.elapsed`: The approximate accumulated collection elapsed time
- `elasticsearch.node.ingest.documents.current`: Total number of documents currently being ingested.
- `elasticsearch.node.ingest.operations.failed`: Total number of failed ingest operations during the lifetime of this node.
- `elasticsearch.node.pipeline.ingest.documents.preprocessed`: Number of documents preprocessed by the ingest pipeline.
- `elasticsearch.node.pipeline.ingest.operations.failed`: Total number of failed operations for the ingest pipeline.
- `elasticsearch.node.pipeline.ingest.documents.current`: Total number of documents currently being ingested by a pipeline.
- `elasticsearch.node.segments.memory`: Size of memory for segment object of a node.
- `elasticsearch.index.operations.completed`: The number of operations completed for an index.
- `elasticsearch.index.operations.time`: Time spent on operations for an index.
- `elasticsearch.index.shards.size`: The size of the shards assigned to this index.
- `elasticsearch.index.operations.merge.size`: The total size of merged segments for an index.
- `elasticsearch.index.operations.merge.docs_count`: The total number of documents in merge operations for an index.
- `elasticsearch.index.segments.count`: Number of segments of an index.
- `elasticsearch.index.segments.size`: Size of segments of an index.
- `elasticsearch.index.segments.memory`: Size of memory for segment object of an index.

Expvar receiver:
- `process.runtime.memstats.total_alloc`: Cumulative bytes allocated for heap objects.
- `process.runtime.memstats.sys`: Total bytes of memory obtained from the OS.
- `process.runtime.memstats.lookups`: Number of pointer lookups performed by the runtime.
- `process.runtime.memstats.mallocs`: Cumulative count of heap objects allocated.
- `process.runtime.memstats.frees`: Cumulative count of heap objects freed.
- `process.runtime.memstats.heap_alloc`: Bytes of allocated heap objects.
- `process.runtime.memstats.heap_sys`: Bytes of heap memory obtained by the OS.
- `process.runtime.memstats.heap_idle`: As defined by https://pkg.go.dev/runtime#MemStats
- `process.runtime.memstats.heap_inuse`: Bytes in in-use spans.
- `process.runtime.memstats.heap_released`: Bytes of physical memory returned to the OS.
- `process.runtime.memstats.heap_objects`: Number of allocated heap objects.
- `process.runtime.memstats.stack_inuse`: Bytes in stack spans.
- `process.runtime.memstats.stack_sys`: Bytes of stack memory obtained from the OS.
- `process.runtime.memstats.mspan_inuse`: Bytes of memory obtained from the OS for mspan structures.
- `process.runtime.memstats.mcache_inuse`: Bytes of allocated mcache structures.
- `process.runtime.memstats.mcache_sys`: Bytes of memory obtained from the OS for mcache structures.
- `process.runtime.memstats.buck_hash_sys`: Bytes of memory in profiling bucket hash tables.
- `process.runtime.memstats.gc_sys`: Bytes of memory in garbage collection metadata.
- `process.runtime.memstats.other_sys`: Bytes of memory in miscellaneous off-heap runtime allocations.
- `process.runtime.memstats.next_gc`: The target heap size of the next GC cycle.
- `process.runtime.memstats.pause_total`: The cumulative nanoseconds in GC stop-the-world pauses since the program started.
- `process.runtime.memstats.last_pause`: The most recent stop-the-world pause time.
- `process.runtime.memstats.num_gc`: Number of completed GC cycles.
- `process.runtime.memstats.num_forced_gc`: Number of GC cycles that were forced by the application calling the GC function.
- `process.runtime.memstats.gc_cpu_fraction`: The fraction of this program's available CPU time used by the GC since the program started.

Flink metrics receiver:
- `flink.jvm.gc.collections.count`: The total number of collections that have occurred.
- `flink.jvm.gc.collections.time`: The total time spent performing garbage collection.

HAProxy receiver:
- `haproxy.connections.rate`: Number of connections over the last elapsed second (frontend). Corresponds to HAProxy's `conn_rate` metric.
- `haproxy.connections.total`: Cumulative number of connections (frontend). Corresponds to HAProxy's `conn_tot` metric.
- `haproxy.connections.errors`: Number of requests that encountered an error trying to connect to a backend server. The backend stat is the sum of the stat. Corresponds to HAProxy's `econ` metric
- `haproxy.connections.retries`: Number of times a connection to a server was retried. Corresponds to HAProxy's `wretr` metric.
- `haproxy.sessions.count`: Current sessions. Corresponds to HAProxy's `scur` metric.
- `haproxy.sessions.total`: Cumulative number of sessions. Corresponds to HAProxy's `stot` metric.
- `haproxy.sessions.average`: Average total session time in ms over the last 1024 requests. Corresponds to HAProxy's `ttime` metric.
- `haproxy.sessions.rate`: Number of sessions per second over last elapsed second. Corresponds to HAProxy's `rate` metric.
- `haproxy.requests.denied`: Requests denied because of security concerns. Corresponds to HAProxy's `dreq` metric
- `haproxy.requests.errors`: Cumulative number of request errors. Corresponds to HAProxy's `ereq` metric.
- `haproxy.requests.redispatched`: Number of times a request was redispatched to another server. Corresponds to HAProxy's `wredis` metric.
- `haproxy.requests.total`: Total number of HTTP requests received. Corresponds to HAProxy's `req_tot`, `hrsp_1xx`, `hrsp_2xx`, `hrsp_3xx`, `hrsp_4xx`, `hrsp_5xx` and `hrsp_other` metrics.
- `haproxy.requests.queued`: Current queued requests. For the backend this reports the number queued without a server assigned. Corresponds to HAProxy's `qcur` metric.
- `haproxy.requests.rate`: HTTP requests per second over last elapsed second. Corresponds to HAProxy's `req_rate` metric.

Host metrics receiver:
- `system.filesystem.inodes.usage`: FileSystem inodes used.
- `system.processes.created`: Total number of created processes.
- `system.processes.count`: Total number of processes in each state.

Memcached receiver:
- `memcached.connections.current`: The current number of open connections.
- `memcached.connections.total`: Total number of connections opened since the server started running.

OracleDB receiver:
- `oracledb.sessions.usage`: Count of active sessions.
- `oracledb.sessions.limit`: Maximum limit of active sessions, -1 if unlimited.
- `oracledb.processes.usage`: Current count of active processes.
- `oracledb.processes.limit`: Maximum limit of active processes, -1 if unlimited.
- `oracledb.enqueue_locks.usage`: Current count of active enqueue locks.
- `oracledb.enqueue_locks.limit`: Maximum limit of active enqueue locks, -1 if unlimited.
- `oracledb.dml_locks.usage`: Current count of active DML (Data Manipulation Language) locks.
- `oracledb.dml_locks.limit`: Maximum limit of active DML (Data Manipulation Language) locks, -1 if unlimited.
- `oracledb.enqueue_resources.usage`: Current count of active enqueue resources.
- `oracledb.enqueue_resources.limit`: Maximum limit of active enqueue resources, -1 if unlimited.
- `oracledb.transactions.usage`: Current count of active transactions.
- `oracledb.transactions.limit`: Maximum limit of active transactions, -1 if unlimited.

PostgresQL receiver:
- `postgresql.bgwriter.buffers.allocated`: Number of buffers allocated.
- `postgresql.bgwriter.buffers.writes`: Number of buffers written.

Redis receiver:
- `redis.clients.connected`: Number of client connections (excluding connections from replicas)
- `redis.clients.max_input_buffer`: Biggest input buffer among current client connections
- `redis.clients.max_output_buffer`: Longest output list among current client connections
- `redis.clients.blocked`: Number of clients pending on a blocking call
- `redis.keys.expired`: Total number of key expiration events
- `redis.keys.evicted`: Number of evicted keys due to maxmemory limit
- `redis.connections.received`: Total number of connections accepted by the server
- `redis.connections.rejected`: Number of connections rejected because of maxclients limit
- `redis.commands.processed`: Total number of commands processed by the server

SAP HANA receiver:
- `saphana.service.memory.compactors.allocated`: The part of the memory pool that can potentially (if unpinned) be freed during a memory shortage.
- `saphana.service.memory.compactors.freeable`: The memory that can be freed during a memory shortage.

I believe most of them can be easily renamed to avoid pluralized namespaces. However, if we just remove the s ending, some metrics will get less descriptive names, e.g. container.pid.limit or redis.client.connected. Do we have any other guidelines that can be applied for renaming this kind of metrics?

cc @djaglowski FYI

TylerHelmuth commented 1 year ago

Another scenario to consider: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/24905#issuecomment-1688903953.

We'd like to add new metrics to the kubeletstatsreceiver that include in the name a section clarifying which conceptual "limit", either requests or limits, is being used to calculate utilization. The Kubernetes term for the values we're using are plural and for the best name recognition experience for these metrics we'd like to use requests and limits.