spacelama / R710-Fan-Control

A fork of R710-IPMI-TEMP from NoLooseEnds/Scripts, generalised to a fan control daemon for Dell Poweredge servers. Has reported to work on R710s, R520, R730. Allows more flexible control of the fan throttling vs the vanilla iDrac control which tends to ramp the fans up to full velocity the moment you add non-Dell hardware.
42 stars 10 forks source link

Suggestion: use "sensors" and "hddtemp" command options to simplify sensor readings #1

Open FreelancerJay opened 2 years ago

FreelancerJay commented 2 years ago

sensors has a -j option to output it's readings in json structures, and they are pretty similar for parsing. I have 2 types of NVMe drives in my system (2 U.2 drives and a bunch of M.2) and they as well as the CPU packages and cores all report their temperatures as floats. The key for each item's temperature is "temp#_input" with # being the number sensor on the particular device, starting at 1. So for example, all my NVMe drives have a single sensor returned there, in an array called "composite", itself in an array called "nvme-pci-####", with the #### being the PCI address. The CPUs and their cores have the sensor in an array called either "package id #" or "core #" (# being which number CPU or Core it is) which are in a array called "core temp-isa-000#" (with # being which CPU it is). Package ID, CPU number and Core number all start at 0 and increment in number. On each processor, the key "temp1_input" always seems to be the Package temperature, while all the individual core temperatures are "temp2_input" and onwards. This pattern holds on my R730 and my 11th gen intel NUC

Example output:

   "nvme-pci-8100":{
      "Adapter": "PCI adapter",
      "Composite":{
         "temp1_input": 33.850,
         "temp1_max": 74.850,
         "temp1_min": -0.150,
         "temp1_crit": 79.850,
         "temp1_alarm": 0.000
      }
   },
   "nvme-pci-0500":{
      "Adapter": "PCI adapter",
      "Composite":{
         "temp1_input": 34.850,
         "temp1_max": 74.850,
         "temp1_min": -0.150,
         "temp1_crit": 79.850,
         "temp1_alarm": 0.000
      }
   },
   "coretemp-isa-0001":{
      "Adapter": "ISA adapter",
      "Package id 1":{
         "temp1_input": 36.000,
         "temp1_max": 90.000,
         "temp1_crit": 100.000,
         "temp1_crit_alarm": 0.000
      },
      "Core 0":{
         "temp2_input": 30.000,
         "temp2_max": 90.000,
         "temp2_crit": 100.000,
         "temp2_crit_alarm": 0.000
      },
      "Core 1":{
         "temp3_input": 30.000,
         "temp3_max": 90.000,
         "temp3_crit": 100.000,
         "temp3_crit_alarm": 0.000
      },
      "Core 2":{
         "temp4_input": 30.000,
         "temp4_max": 90.000,
         "temp4_crit": 100.000,
         "temp4_crit_alarm": 0.000
      },
      "Core 3":{
         "temp5_input": 30.000,
         "temp5_max": 90.000,
         "temp5_crit": 100.000,
         "temp5_crit_alarm": 0.000
      },
      "Core 4":{
         "temp6_input": 31.000,
         "temp6_max": 90.000,
         "temp6_crit": 100.000,
         "temp6_crit_alarm": 0.000
      }

So we just need the value for key "temp1_input" inside any "coretemp-isa-" array for the overall CPU temp, and any value for key "temp[2...100]_input" from any "coretemp-isa-" array for the separate cores. Because we know they are values with decimals, it's pretty safe to cast them as floats without needing to do any further regex to get rid of dots, degrees symbols, letters and so on.

hddtemp can report all the temps from SATA and "I think" SAS drives as just a list of integer numbers by using the -n option.

Example output from my R730:

# hddtemp -n /dev/sd?
24
25
24
24
24
25
25
26
26
27
33
32
/dev/sdm: USB:  drive supported, but it doesn't have a temperature sensor.
/dev/sdn: DELL: S.M.A.R.T. not available

Pretty simple, as you go to each line if it casts successfully to an int, keep it, if it doesn't, drop it. Or evaluate each line for starting with a "/" and if it does, skip that reading, if it doesn't, cast to an int and store in an array to be evaluated. Either way means less regex to filter unwanted values, and less thinking ahead for what sort of characters might pop up that we don't want. Two of the drives in the output above actually report floats in smartctl for their temperatures to 2 decimal places, but hddtemp seems to either round the temp or just drop the decimal places, so it seems we can rely on ints for it.

Hopefully can lead to cleaner, easier to maintain code, and less issues like we had with the differences of the raw output of these commands on different OS's, like our little snafu with the difference between proxmox 6 and 7 using unicode for the degrees symbol or not.

Given time I might be able to create a version like this and send you a pull, but I genuinely know nothing about perl so no promises!

spacelama commented 1 year ago

Comment copied back from reddit before reddit goes away...

I'd start with "use JSON;": https://metacpan.org/pod/JSON

Without having dealt with it recently, it'll come back as a hash, I presume, 
and then you'll want to use keys (perdoc -f keys gives good docco on how
to typically use keys) on the hash to search for items starting with
"Package id"/"Core", "temp" or whatever. Since you're not really interested
in the intermediate objects (who cares if it's core 2 or core 31?), might just
want to convert to arrays and flatten.

I also had previously chucked my own hddtemp in /usr/local/bin (and documented in the README), that skips past sleeping drives unlike the ancient unmaintained hddtemp contained in most distributions. Unfortunately, I took a shortcut to maintain MVP backwards compatibility with the OS's hddtemp, and only made it output the one format forcing "°C", and didn't put in a switch to say "-n". Easy to fix, but something else to remember.