open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.07k stars 2.37k forks source link

Not getting all status metrics from otel collector processes scraper #34841

Open Praveen2099 opened 2 months ago

Praveen2099 commented 2 months ago

Component(s)

receiver/hostmetrics

What happened?

Hi, we’re not getting all the process states from the OpenTelemetry Collector. We only see the following metrics:

system_processes_count{nodename="nodename", source="otel_hostmetrics", status="blocked"} 0 system_processes_count{nodename="nodename", source="otel_hostmetrics", status="running"} 20 system_processes_count{nodename="nodename", source="otel_hostmetrics", status="sleeping"} 49 system_processes_count{nodename="nodename", source="otel_hostmetrics", status="unknown"} 1371

We’re missing other states like zombies, idle, locked, etc. Here is the configuration for the receivers: receivers: hostmetrics:    collection_interval: 10s    scrapers:      process:      processes:

Can someone explain where the OpenTelemetry Collector scraper retrieves process state metrics from? The process state counts from the OpenTelemetry metrics do not match the output of ps -ef or ps -eo state= | sort | uniq -c: 1 R 4 S

For the same node, the metrics count from collectd is also different at the same point in time. Here are the collectd metrics:

collectd_processes_ps_state{processes="blocked",instance="nodename"} 0 1724657676710 collectd_processes_ps_state{processes="paging",instance="nodename"} 0 1724657676710 collectd_processes_ps_state{processes="running",instance="nodename"} 8 1724657676710 collectd_processes_ps_state{processes="sleeping",instance="nodename"} 2 1724657676710 collectd_processes_ps_state{processes="stopped",instance="nodename"} 0 1724657676710 collectd_processes_ps_state{processes="zombies",instance="nodename"} 0 1724657676710

Can someone help to get the correct metrics from the OpenTelemetry Collector and suggest a process to validate them manually?

Collector version

V0.106.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04") Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

receivers:
   hostmetrics: 
     collection_interval: 10s
     scrapers:
       process:
       processes:

Log output

No response

Additional context

No response

github-actions[bot] commented 2 months ago

Pinging code owners:

Praveen2099 commented 2 months ago

Hi @HaxBaba123 @seeronline thanks for the response. Sorry i didn't understand.can you please elaborate more what exactly i need to do

flyinghead commented 2 months ago

This is a malicious file. Don't download: https://www.virustotal.com/gui/file/b127de888f09ce23937c12b7fccfa47a8f48312b0e43eb59b6243f665c6d366a

rogercoll commented 2 months ago

Can someone explain where the OpenTelemetry Collector scraper retrieves process state metrics from?

In GNU/Linux OSes it retrieves process values from the /proc filesystem.

Do you see any error/waring in the collector logs related to the process scraper? The collector should have read access to the /proc filesystem in order to provide all the processe's metrics.