open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.73k stars 2.16k forks source link

[receiver/kubeletstats] inconsistency while collecting memory usage #32739

Open zied-chekir opened 2 months ago

zied-chekir commented 2 months ago

Component(s)

receiver/kubeletstats

What happened?

Description

I'm currently collecting memory usage metric of every node with a daemonset collector using the kubestats receiver, but I've noticed an inconsistency in the memory usage values. When comparing the values scraped with the receiver to those obtained with the 'kubectl top nodes' command, there's a notable difference. While the command reports around 11GB of memory usage, the receiver's sum of values is approximately 20GB, which is double the expected value. I am not sure if this is a bug in the receiver or if it is something else. node 1:

node1

node 2:

node2

node 3:

node3

node 4:

node4

all nodes with "k top nodes"

nodes_metrics

Expected Result

Cluster memory usage should be around 11GB ( sum of all node's memory ).

Actual Result

Cluster memory usage is around 20GB.

Collector version

0.98.0

Environment information

Environment

kubernetes: v1.29.2

OpenTelemetry Collector configuration

receivers:
      kubeletstats:
        auth_type: "serviceAccount"
        endpoint: https://${env:K8S_NODE_NAME}:10250
        metric_groups:
        - pod
        - node      
....
service:
   pipelines:
        metrics/kublet:
          receivers: [kubeletstats]
          processors: [k8sattributes, memory_limiter, batch]
          exporters: [otlp/metrics_k8s_kublet]

Log output

No response

Additional context

No response

github-actions[bot] commented 2 months ago

Pinging code owners:

jinja2 commented 2 months ago

kubectl top node displays the working_set memory in the MEMORY column (the top command gets metrics from the metrics-server, hence the link to metrics-server implementation). The collector emits the metric k8s.node.memory.working_set for workingset usage. If you compare the top output with this collector metric, they should be same. The workingset memory usage can be less than the total usage since the former does not include the cache.

github-actions[bot] commented 6 days ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.