nextcloud / serverinfo

📊 A monitoring app which creates a server info dashboard for admins
GNU Affero General Public License v3.0
94 stars 61 forks source link

Add new mechanism to get thermal notification #551

Open ostasevych opened 7 months ago

ostasevych commented 7 months ago

Basically the serverinfo gets the information from /sys/class/thermal/thermal_zone*/temp. At the same time some AMD motherboards and their chipsets do not store the information there, but in hwmon. Eg, I have hp microserver, and it grabs and stores the temperature data :

k10temp:
temp1 /sys/devices/pci0000:00/0000:00:18.3/hwmon/hwmon3/temp1_input

w83795adg-i2c-1-2f:
temp1 /sys/devices/pci0000:00/0000:00:14.0/i2c-1/1-002f/temp1_input
temp2 /sys/devices/pci0000:00/0000:00:14.0/i2c-1/1-002f/temp2_input
temp5 /sys/devices/pci0000:00/0000:00:14.0/i2c-1/1-002f/temp5_input

jc42-i2c-0-18
temp1 /sys/devices/pci0000:00/0000:00:14.0/i2c-0/0-0018/hwmon/hwmon0/temp1_input

jc42-i2c-0-19
temp1 /sys/devices/pci0000:00/0000:00:14.0/i2c-0/0-0019/hwmon/hwmon0/temp1_input 

And

# find /sys -name "temp*_input"
/sys/devices/pci0000:00/0000:00:18.3/hwmon/hwmon3/temp1_input
/sys/devices/pci0000:00/0000:00:14.0/i2c-1/1-002f/temp1_input
/sys/devices/pci0000:00/0000:00:14.0/i2c-1/1-002f/temp5_input
/sys/devices/pci0000:00/0000:00:14.0/i2c-1/1-002f/temp2_input
/sys/devices/pci0000:00/0000:00:14.0/i2c-0/0-0019/hwmon/hwmon1/temp1_input
/sys/devices/pci0000:00/0000:00:14.0/i2c-0/0-0018/hwmon/hwmon0/temp1_input

As well as, lm-sensors produces good data.

Is that possible to grab the data in a more universal way, eg from hwmon class, but not from the thermal_zone class?

Read more here https://github.com/Mellanox/mlxsw/wiki/Temperature-and-Fan-Control

ostasevych commented 7 months ago

UPD: My quick dirty hack in the function getThermalZones() in lib/OperatingSystems/DefaultOs.php, which analyses presence of thermal zones. If not getting data from hwmon.

public function getThermalZones(): array {
                if(is_dir("/sys/class/thermal/thermal_zone*")) {
                    $thermalZones = glob('/sys/class/thermal/thermal_zone*') ?: [];
                    $result = [];
                foreach ($thermalZones as $thermalZone) {
                        $tzone = [];
                        try {
                                $tzone['hash'] = md5($thermalZone);
                                $tzone['type'] = $this->readContent($thermalZone . '/type');
                                $tzone['temp'] = (float)((int)($this->readContent($thermalZone . '/temp')) / 1000);
                                if ($tzone['temp'] > 0) { $tzone['temp'] = '+'.$tzone['temp']; }
                        } catch (RuntimeException $e) {
                                continue;
                        }
                        $result[] = $tzone;
                    }
                } else {
                    $thermalZones = glob('/sys/class/hwmon/hwmon*') ?: [];
                    $result = [];
                    foreach ($thermalZones as $thermalZone) {
                        $tzone = [];
                        try {
                                $tzone['hash'] = md5($thermalZone);
                                $tzone['type'] = $this->readContent($thermalZone . '/name');
                                $tzone['temp'] = (float)((int)($this->readContent($thermalZone . '/temp1_input')) / 1000);
                        } catch (RuntimeException $e) {
                                continue;
                        }
                        $result[] = $tzone;
                    }
                }
                return $result;
        }

The data are not so comfortable to interpret:

image

sensors gives the following data:

jc42-i2c-0-18
Adapter: SMBus PIIX4 adapter port 0 at 0b00
RAM1 Temp:    +13.75°C  (low  =  +0.0°C)
                       (high = +60.0°C, hyst = +54.0°C)
                       (crit = +70.0°C, hyst = +64.0°C)

jc42-i2c-0-19
Adapter: SMBus PIIX4 adapter port 0 at 0b00
RAM2 Temp:    +13.5°C  (low  =  +0.0°C)
                       (high = +60.0°C, hyst = +54.0°C)
                       (crit = +70.0°C, hyst = +64.0°C)

k10temp-pci-00c3
Adapter: PCI adapter
CPU Core Temp:  +24.75°C  (high = +70.0°C)
                         (crit = +100.0°C, hyst = +95.0°C)

And this data are completely missing:

w83795adg-i2c-1-2f
CPU Temp:     +26.0°C  (high = +109.0°C, hyst = +109.0°C)
                       (crit = +109.0°C, hyst = +109.0°C)  sensor = thermal diode
NB Temp:      +29.0°C  (high = +105.0°C, hyst = +105.0°C)
                       (crit = +105.0°C, hyst = +105.0°C)  sensor = thermal diode
MB Temp:       +4.5°C  (high = +39.0°C, hyst = +39.0°C)
                       (crit = +44.0°C, hyst = +44.0°C)  sensor = thermistor
kesselb commented 7 months ago

Hey,

Using /sys/class/hwmon/hwmon looks okay to me.

The data are not so comfortable to interpret:

A device in /sys/class/hwmon/hwmon1 is a "driver" and can have many sensors.

I guess you want something like below to read all sensors.

Index: lib/OperatingSystems/Linux.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/lib/OperatingSystems/Linux.php b/lib/OperatingSystems/Linux.php
--- a/lib/OperatingSystems/Linux.php    (revision 268a3601683d8d1d0605ba3d1c17b44afab007e2)
+++ b/lib/OperatingSystems/Linux.php    (date 1704898019944)
@@ -232,6 +232,18 @@
    public function getThermalZones(): array {
        $data = [];

+       $drivers = glob('/sys/class/hwmon/hwmon*');
+       foreach ($drivers as $driver) {
+           $name = $this->readContent($driver . '/name');
+
+           $zones = glob($driver . '/temp*_label');
+           foreach ($zones as $zone) {
+               $type = $name . ' ' . $this->readContent($zone);
+               $temp = (int)$this->readContent(str_replace('_label', '_input', $zone)) / 1000;
+               $data[] = new ThermalZone(md5($zone), $type, $temp);
+           }
+       }
+
        $zones = glob('/sys/class/thermal/thermal_zone*');
        if ($zones === false) {
            return $data;

image

ostasevych commented 7 months ago
$data = [];

+     $drivers = glob('/sys/class/hwmon/hwmon*');
+     foreach ($drivers as $driver) {
+         $name = $this->readContent($driver . '/name');
+
+         $zones = glob($driver . '/temp*_label');
+         foreach ($zones as $zone) {
+             $type = $name . ' ' . $this->readContent($zone);
+             $temp = (int)$this->readContent(str_replace('_label', '_input', $zone)) / 1000;
+             $data[] = new ThermalZone(md5($zone), $type, $temp);
+         }
+     }

Oh, that is nice!

@kesselb Daniel, may I ask you to post here the whole text of the patched function getThermalZones for NC v27?