prometheus / procfs

procfs provides functions to retrieve system, kernel and process metrics from the pseudo-filesystem proc.
Apache License 2.0
769 stars 319 forks source link

Bug Fix: SystemCPUfreq fails when any core is offline #497

Closed taherkk closed 1 year ago

taherkk commented 1 year ago

SystemCpufreq function does not check if a core is online before collecting metrics. In the case when a core is offline it fails and does not collect a couple of metrics.

Refer https://github.com/prometheus/node_exporter/issues/2577 for more details

taherkk commented 1 year ago

I have read the readme and steps for contribution and have tried my best to follow it. This my first open source contribution so please guide me if I have missed something.

taherkk commented 1 year ago

Tests are failing because each directory under .../cpu[0-9] has a file named online in Linux except for cpu0. This contains int spicifying if the core is online or not. I added the file in my local setup manually and the tests are passing.

Please help me in configuring that file using fixtures.

Please review @pgier

discordianfish commented 1 year ago

You also need to sign off your commit, see the failing DCO check above

taherkk commented 1 year ago

Signed off my commit. Not sure why the lint step is failing.

discordianfish commented 1 year ago

Here is the lint error:

#!/bin/bash -eo pipefail
git diff --exit-code
diff --git a/testdata/fixtures.ttar b/testdata/fixtures.ttar
index cb49947..b80316a 100644
--- a/testdata/fixtures.ttar
+++ b/testdata/fixtures.ttar
@@ -12543,11 +12543,6 @@ Mode: 444
 Directory: fixtures/sys/devices/system/cpu/cpu1
 Mode: 775
 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-Path: fixtures/sys/devices/system/cpu/cpu1/online
-Lines: 1
-1
-Mode: 544
-# ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Directory: fixtures/sys/devices/system/cpu/cpu1/cpufreq
 Mode: 775
 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
@@ -12606,6 +12601,11 @@ Lines: 1
 <unsupported>
 Mode: 664
 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+Path: fixtures/sys/devices/system/cpu/cpu1/online
+Lines: 1
+1
+Mode: 544
+# ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Directory: fixtures/sys/devices/system/cpu/cpu1/thermal_throttle
 Mode: 755
 # ttar - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Exited with code exit status 1
CircleCI received exit code 1

You probably didn't run make fixtures to create the ttar file

taherkk commented 1 year ago

I'm exploring the idea of checking for offline cpu's in /sys/devices/system/cpu/offline

Sharing the approach here: Get cpu's from the file. Expand ranges. Filter cpu's which not physically present (Though I have a hexacore CPU the offline file shows a range of 12-15 which are not physically present)

taherkk commented 1 year ago

@pgier, I have updated the code according to the approach mentioned earlier.

AdarshdeepCheema commented 1 year ago

@taherkk I think it is causing https://github.com/prometheus/procfs/issues/530. Can u please confirm FYI @pgier @discordianfish

taherkk commented 1 year ago

raised PR #534 for it @pgier