open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
218 stars 142 forks source link

Define semantic conventions for k8s metrics #1032

Open ChrsMark opened 1 month ago

ChrsMark commented 1 month ago

Area(s)

area:k8s

Is your change request related to a problem? Please describe.

At the moment there are not Semantic Conventions for k8s metrics.

Describe the solution you'd like

Even if we cannot consider the k8s metrics as stable we can start considering adding metrics that are not controversial to get some progress here. This issue aims to collect the existing k8s metrics that exist in the Collector and keep track of any related work. Bellow I'm providing an initial list with metrics coming from the kubeletstats and k8scluster receivers. Note that these are matter to change with time being so we should get back to the Collector to verify the current state.

cc: @open-telemetry/semconv-k8s-approvers

Describe alternatives you've considered

No response

Additional context

Below there are some metrics from namespaces other than k8s.* as well. I leave them in there intentionally in order to take them into account accordingly.

kubeletstats metrics

k8s.node.cpu.usage k8s.node.cpu.utilization k8s.node.cpu.time k8s.node.memory.available k8s.node.memory.usage k8s.node.memory.rss k8s.node.memory.working_set k8s.node.memory.page_faults k8s.node.memory.major_page_faults k8s.node.filesystem.available k8s.node.filesystem.capacity k8s.node.filesystem.usage k8s.node.network.io k8s.node.network.errors k8s.node.uptime k8s.pod.cpu.usage k8s.pod.cpu.utilization: Deprecated k8s.pod.cpu.time k8s.pod.memory.available k8s.pod.memory.usage k8s.pod.cpu_limit_utilization k8s.pod.cpu_request_utilization k8s.pod.memory_limit_utilization k8s.pod.memory_request_utilization k8s.pod.memory.rss k8s.pod.memory.working_set k8s.pod.memory.page_faults k8s.pod.memory.major_page_faults k8s.pod.filesystem.available k8s.pod.filesystem.capacity k8s.pod.filesystem.usage k8s.pod.network.io k8s.pod.network.errors k8s.pod.uptime container.cpu.usage: https://github.com/open-telemetry/semantic-conventions/pull/1128
container.cpu.utilization: Deprecated container.cpu.timehttps://github.com/open-telemetry/semantic-conventions/pull/282 container.memory.available container.memory.usage: ✅ https://github.com/open-telemetry/semantic-conventions/pull/282 k8s.container.cpu_limit_utilization k8s.container.cpu_request_utilization k8s.container.memory_limit_utilization k8s.container.memory_request_utilization container.memory.rss container.memory.working_set container.memory.page_faults container.memory.major_page_faults container.filesystem.available container.filesystem.capacity container.filesystem.usage container.uptime k8s.volume.available k8s.volume.capacity k8s.volume.inodes k8s.volume.inodes.free k8s.volume.inodes.used

k8scluster metrics

k8s.container.cpu_request k8s.container.cpu_limit k8s.container.memory_request k8s.container.memory_limit k8s.container.storage_request k8s.container.storage_limit k8s.container.ephemeralstorage_request k8s.container.ephemeralstorage_limit k8s.container.restarts k8s.container.ready k8s.pod.phase k8s.pod.status_reason k8s.deployment.desired k8s.deployment.available k8s.cronjob.active_jobs k8s.daemonset.current_scheduled_nodes k8s.daemonset.desired_scheduled_nodes k8s.daemonset.misscheduled_nodes k8s.daemonset.ready_nodes k8s.hpa.max_replicas k8s.hpa.min_replicas k8s.hpa.current_replicas k8s.hpa.desired_replicas k8s.job.active_pods k8s.job.desired_successful_pods k8s.job.failed_pods k8s.job.max_parallel_pods k8s.job.successful_pods k8s.namespace.phase k8s.replicaset.desired k8s.replicaset.available k8s.replication_controller.desired k8s.replication_controller.available k8s.resource_quota.hard_limit k8s.resource_quota.used k8s.statefulset.desired_pods k8s.statefulset.ready_pods k8s.statefulset.current_pods k8s.statefulset.updated_pods openshift.clusterquota.limit openshift.clusterquota.used openshift.appliedclusterquota.limit openshift.appliedclusterquota.used k8s.node.condition

Related issues

TBA

TylerHelmuth commented 1 month ago

I love the idea of moving forward with this work. According to the collector end-user survey k8s and the collector are a big part of our end-user's stack, so moving the related semconvs forwards is a great idea.

sirianni commented 1 month ago

In general, my team has been happy with the metrics collected by kubeletstatsreceiver and how they are modeled. They are struggling significantly with the "state" metrics that come from k8sclusterreceiver. We are coming from a Datadog background.