henrygd / beszel

Lightweight server monitoring hub with historical data, docker stats, and alerts.
MIT License
2.73k stars 88 forks source link

Lost temperature chart #167

Closed JayceeB1 closed 1 month ago

JayceeB1 commented 2 months ago

Hi!

Using Beszel 0.3 with Ubuntu 24 / CasaOS 0.4.11, Intel(R) N100 (4c / 4t)

Suddenly the temperature chart disappeared, I tried to reinstall but nothing works Any ideas?

Thanks

henrygd commented 2 months ago

Do your system_stats records in PocketBase have a t property?

If not, it's a problem with the agent reading sensors. It gets the values from these directories, so make sure there are files showing up in at least one of them.

ls /sys/class/hwmon/hwmon*/temp*_input
ls /sys/class/hwmon/hwmon*/device/temp*_input
ls /sys/class/thermal/thermal_zone*/

Also check that the sensors command shows sensors, and restart the agent if you haven't.

JayceeB1 commented 2 months ago

Do your system_stats records in PocketBase have a t property?

If not, it's a problem with the agent reading sensors. It gets the values from these directories, so make sure there are files showing up in at least one of them.

ls /sys/class/hwmon/hwmon*/temp*_input
ls /sys/class/hwmon/hwmon*/device/temp*_input
ls /sys/class/thermal/thermal_zone*/

Also check that the sensors command shows sensors, and restart the agent if you haven't.

No t section in system stats: { "cpu": 1.32, "d": 467.35, "dp": 7.49, "dr": 0, "du": 33.21, "dw": 0.05, "m": 15.4, "mb": 12.45, "mp": 16.67, "mu": 2.57, "nr": 0, "ns": 0, "s": 4, "su": 1.06 }

Already restarted the agent. I'll check the files laters.

Thanks for the support 🫡

timmyhbk commented 1 month ago

Hello, I am having the same problem.

Hub: v0.4.0 Agent: v0.4.0 OS: Ubuntu 24.04 / Docker 27.3.1

System_stats records in PocketBase dosen't have a t property.

{
  "cpu": 0.6,
  "d": 454.37,
  "dp": 1.72,
  "dr": 0,
  "du": 7.43,
  "dw": 0.01,
  "m": 7.48,
  "mb": 0.49,
  "mp": 4.53,
  "mu": 0.34,
  "nr": 0,
  "ns": 0,
  "s": 4
}

sensors has return information.

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:            N/A

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +27.8°C

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +40.0°C  (high = +105.0°C, crit = +105.0°C)
Core 0:        +39.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:        +39.0°C  (high = +105.0°C, crit = +105.0°C)
Core 2:        +39.0°C  (high = +105.0°C, crit = +105.0°C)
Core 3:        +39.0°C  (high = +105.0°C, crit = +105.0°C)

nvme-pci-0400
Adapter: PCI adapter
Composite:    +39.9°C  (low  =  -0.1°C, high = +71.8°C)
                       (crit = +89.8°C)

If not, it's a problem with the agent reading sensors. It gets the values from these directories, so make sure there are files showing up in at least one of them.

ls /sys/class/hwmon/hwmon/temp_input ls /sys/class/hwmon/hwmon/device/temp_input ls /sys/class/thermal/thermal_zone*/

$ ls /sys/class/hwmon/hwmon*/temp*_input
/sys/class/hwmon/hwmon0/temp1_input  /sys/class/hwmon/hwmon2/temp1_input  /sys/class/hwmon/hwmon3/temp2_input  /sys/class/hwmon/hwmon3/temp4_input
/sys/class/hwmon/hwmon1/temp1_input  /sys/class/hwmon/hwmon3/temp1_input  /sys/class/hwmon/hwmon3/temp3_input  /sys/class/hwmon/hwmon3/temp5_input
$ ls /sys/class/hwmon/hwmon*/device/temp*_input
ls: cannot access '/sys/class/hwmon/hwmon*/device/temp*_input': No such file or directory
$ ls /sys/class/thermal/thermal_zone*/
/sys/class/thermal/thermal_zone0/:
available_policies  cdev0_weight      cdev1_weight      cdev2_weight      cdev3_weight      cdev4_weight  hwmon0           k_i   mode    power      sustainable_power  trip_point_0_temp  trip_point_1_temp  trip_point_2_temp  trip_point_3_temp  trip_point_4_temp  trip_point_5_temp  uevent
cdev0               cdev1             cdev2             cdev3             cdev4             device        integral_cutoff  k_po  offset  slope      temp               trip_point_0_type  trip_point_1_type  trip_point_2_type  trip_point_3_type  trip_point_4_type  trip_point_5_type
cdev0_trip_point    cdev1_trip_point  cdev2_trip_point  cdev3_trip_point  cdev4_trip_point  emul_temp     k_d              k_pu  policy  subsystem  trip_point_0_hyst  trip_point_1_hyst  trip_point_2_hyst  trip_point_3_hyst  trip_point_4_hyst  trip_point_5_hyst  type

/sys/class/thermal/thermal_zone1/:
available_policies  emul_temp  integral_cutoff  k_d  k_i  k_po  k_pu  mode  offset  policy  power  slope  subsystem  sustainable_power  temp  type  uevent

/sys/class/thermal/thermal_zone2/:
available_policies  integral_cutoff  k_i   k_pu  offset  power  subsystem          temp               trip_point_0_temp  trip_point_1_hyst  trip_point_1_type  trip_point_2_temp  trip_point_3_hyst  trip_point_3_type  trip_point_4_temp  trip_point_5_hyst  trip_point_5_type  uevent
emul_temp           k_d              k_po  mode  policy  slope  sustainable_power  trip_point_0_hyst  trip_point_0_type  trip_point_1_temp  trip_point_2_hyst  trip_point_2_type  trip_point_3_temp  trip_point_4_hyst  trip_point_4_type  trip_point_5_temp  type

/sys/class/thermal/thermal_zone3/:
available_policies  k_d   k_pu    policy  subsystem          trip_point_0_hyst  trip_point_1_hyst  trip_point_2_hyst  trip_point_3_hyst  trip_point_4_hyst  trip_point_5_hyst  trip_point_6_hyst  trip_point_7_hyst  type
emul_temp           k_i   mode    power   sustainable_power  trip_point_0_temp  trip_point_1_temp  trip_point_2_temp  trip_point_3_temp  trip_point_4_temp  trip_point_5_temp  trip_point_6_temp  trip_point_7_temp  uevent
integral_cutoff     k_po  offset  slope   temp               trip_point_0_type  trip_point_1_type  trip_point_2_type  trip_point_3_type  trip_point_4_type  trip_point_5_type  trip_point_6_type  trip_point_7_type

/sys/class/thermal/thermal_zone4/:
available_policies  emul_temp  integral_cutoff  k_d  k_i  k_po  k_pu  mode  offset  policy  power  slope  subsystem  sustainable_power  temp  trip_point_0_hyst  trip_point_0_temp  trip_point_0_type  type  uevent

/sys/class/thermal/thermal_zone5/:
available_policies  integral_cutoff  k_po  offset  slope              temp               trip_point_0_type  trip_point_1_type  trip_point_2_type  trip_point_3_type  trip_point_4_type  trip_point_5_type  trip_point_6_type  trip_point_7_type
emul_temp           k_d              k_pu  policy  subsystem          trip_point_0_hyst  trip_point_1_hyst  trip_point_2_hyst  trip_point_3_hyst  trip_point_4_hyst  trip_point_5_hyst  trip_point_6_hyst  trip_point_7_hyst  type
hwmon2              k_i              mode  power   sustainable_power  trip_point_0_temp  trip_point_1_temp  trip_point_2_temp  trip_point_3_temp  trip_point_4_temp  trip_point_5_temp  trip_point_6_temp  trip_point_7_temp  uevent

/sys/class/thermal/thermal_zone6/:
available_policies  emul_temp  integral_cutoff  k_d  k_i  k_po  k_pu  mode  offset  policy  power  slope  subsystem  sustainable_power  temp  trip_point_0_hyst  trip_point_0_temp  trip_point_0_type  trip_point_1_hyst  trip_point_1_temp  trip_point_1_type  type  uevent

Is there any way to fix this? Thanks!

henrygd commented 1 month ago

@timmyhbk Strange. Did you also have temperatures at one point and lose them, or did the temperatures never come in at all?

I'll try to add a debug log level to the agent in the next release to help with situations like this.

timmyhbk commented 1 month ago

Sorry for not explaining it clearly. This machine was just installed today and has never displayed the temperature.

I have another PVE machine that has been used for a while, and it also has never displayed the temperature (installed directly as a binary on Proxmox).

JayceeB1 commented 1 month ago

On my side, I had the temperature at first

Rhyn commented 1 month ago

I just added 3 systems to beszel. 2 openwrt 23.05 and one ubuntu 22.04. Thermal reading only working on one.

Noticably the third system which has thermals are the only one without any relevant files in hwmon subdirectory, also the only one not throwing error iterating trough all of those files besides "no such files".

I hope this helps finding the problem and fixing it.

JayceeB1 commented 1 month ago

Maybe an Ubuntu update broke something?

kernelkaribou commented 1 month ago

Seeing the same issue as timmyhbk, ls /sys/class/thermal/thermal_zone*/ and ls /sys/class/hwmon/hwmon*/temp*_input showing results. Sensors also showing results.

sensors
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:            N/A  

nvme-pci-0600
Adapter: PCI adapter
Composite:    +46.9°C  (low  = -273.1°C, high = +81.8°C)
                       (crit = +84.8°C)
Sensor 1:     +46.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +50.9°C  (low  = -273.1°C, high = +65261.8°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +16.8°C  (crit = +20.8°C)
temp2:        +27.8°C  (crit = +105.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +32.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:        +31.0°C  (high = +80.0°C, crit = +100.0°C)
Core 4:        +27.0°C  (high = +80.0°C, crit = +100.0°C)
Core 8:        +27.0°C  (high = +80.0°C, crit = +100.0°C)
Core 12:       +29.0°C  (high = +80.0°C, crit = +100.0°C)
Core 16:       +28.0°C  (high = +80.0°C, crit = +100.0°C)
Core 20:       +29.0°C  (high = +80.0°C, crit = +100.0°C)
Core 24:       +28.0°C  (high = +80.0°C, crit = +100.0°C)
Core 28:       +27.0°C  (high = +80.0°C, crit = +100.0°C)
Core 32:       +28.0°C  (high = +80.0°C, crit = +100.0°C)
Core 33:       +28.0°C  (high = +80.0°C, crit = +100.0°C)
Core 34:       +29.0°C  (high = +80.0°C, crit = +100.0°C)
Core 35:       +29.0°C  (high = +80.0°C, crit = +100.0°C)
Core 36:       +29.0°C  (high = +80.0°C, crit = +100.0°C)
Core 37:       +29.0°C  (high = +80.0°C, crit = +100.0°C)
Core 38:       +29.0°C  (high = +80.0°C, crit = +100.0°C)
Core 39:       +29.0°C  (high = +80.0°C, crit = +100.0°C)

nvme-pci-0100
Adapter: PCI adapter
Composite:    +43.9°C  (low  = -273.1°C, high = +81.8°C)
                       (crit = +84.8°C)
Sensor 1:     +43.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +47.9°C  (low  = -273.1°C, high = +65261.8°C)

nvme-pci-0200
Adapter: PCI adapter
Composite:    +46.9°C  (low  = -273.1°C, high = +81.8°C)
                       (crit = +84.8°C)
Sensor 1:     +46.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +50.9°C  (low  = -273.1°C, high = +65261.8°C)

Running on Debian, in docker with agent 0.4.0. I have never had temperatures.

lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 12 (bookworm)
Release:        12
Codename:       bookworm

No 't' stats either

{
  "cpu": 1.13,
  "d": 914.38,
  "dp": 1.79,
  "dr": 0,
  "du": 15.53,
  "dw": 0.03,
  "efs": {
    "pool01": {
      "d": 899.19,
      "du": 37.53,
      "r": 0,
      "w": 0
    },
    "pool02": {
      "d": 16623.97,
      "du": 9205.86,
      "r": 0,
      "w": 0
    },
    "sdb1": {
      "d": 915.82,
      "du": 3.66,
      "r": 0,
      "w": 0.03
    },
    "sdc1": {
      "d": 3666.45,
      "du": 306.96,
      "r": 0,
      "w": 0
    }
  },
  "m": 62.59,
  "mb": 17.4,
  "mp": 61.94,
  "mu": 38.76,
  "nr": 0.01,
  "ns": 0,
  "s": 0.95,
  "su": 0.01
}
henrygd commented 1 month ago

The sensor data is collected using the gopsutil sensors package.

There will be a LOG_LEVEL=debug option for the agent in the next release -- likely tomorrow or Sunday -- that will print all sensor data and any errors retrieving it.

If you want to troubleshoot on your own before then, you can try this code and report back.

package main

import (
    "context"
    "fmt"

    "github.com/shirou/gopsutil/v4/sensors"
)

func main() {
    temperatures, err := sensors.TemperaturesWithContext(context.Background())
    if err != nil {
        panic(err)
    }
    for _, temp := range temperatures {
        fmt.Printf("%s: %.1f\n", temp.SensorKey, temp.Temperature)
    }
}
Rhyn commented 1 month ago

First time building go so took a bit of tinkering to test it and maybe I did something wrong. nevertheless it's not something I expected:

# go mod init local/sensors
go: creating new go.mod: module local/sensors
go: to add module requirements and sums:
        go mod tidy
# go mod tidy
go: finding module for package github.com/shirou/gopsutil/v4/sensors
go: found github.com/shirou/gopsutil/v4/sensors in github.com/shirou/gopsutil/v4 v4.24.8
# go build
# go build
# ./sensors
panic: Number of warnings: 1

goroutine 1 [running]:
main.main()
        /tmp/test/main.go:13 +0x145

Is there anything else i could try or do differently?

henrygd commented 1 month ago

@Rhyn Thanks, that's helpful. This should print the error:

package main

import (
    "context"
    "fmt"

    "github.com/shirou/gopsutil/v4/sensors"
)

func main() {
    temperatures, err := sensors.TemperaturesWithContext(context.Background())
    if err != nil {
        err.(*sensors.Warnings).Verbose = true
        panic(err)
    }
    for _, temp := range temperatures {
        fmt.Printf("%s: %.1f\n", temp.SensorKey, temp.Temperature)
    }
}
Rhyn commented 1 month ago
# ./sensors
panic:  Error 0: read /sys/class/hwmon/hwmon3/temp1_input: no data available

goroutine 1 [running]:
main.main()
        /tmp/test/main.go:14 +0x156

it's the same error cat gives me

henrygd commented 1 month ago

Thanks, can you change panic(err) to fmt.Println(err)?

I think it should actually still populate the sensors that work and log it below if we don't panic.

May be an easy fix.

Rhyn commented 1 month ago
# ./sensors
        Error 0: read /sys/class/hwmon/hwmon3/temp1_input: no data available

acpitz: 27.8
nvme_composite: 38.9
nvme_sensor_1: 38.9
nvme_sensor_2: 38.9
coretemp_package_id_0: 42.0
coretemp_core_0: 42.0
coretemp_core_1: 42.0
coretemp_core_2: 42.0
coretemp_core_3: 41.0
coretemp_core_4: 41.0
coretemp_core_5: 41.0

Looks like an easy fix indeed :)

henrygd commented 1 month ago

This should be fixed now in 0.5.0.

If they still don't come in for you, let me know and I'll reopen the issue.

I also added a SENSORS environment variable to whitelist select sensors if you want.