intel / thermal_daemon

Thermal daemon for IA
GNU General Public License v2.0
539 stars 117 forks source link

Incorrect detection of intel-rapl #401

Closed isomer closed 1 month ago

isomer commented 1 year ago

I have a Lenovo T16 Gen 1 (Intel) running Linux heatwave 6.3.0-1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.3.7-1 (2023-06-12) x86_64 GNU/Linux

Thermald appears to be looking in /sys/class/powercap/intel-rapl/intel-rapl:0/* for zones, which seems to be one level too deep. I suspect this should be looking in /sys/class/powercap/intel-rapl/*/ for zones. (It also doesn't appear to notice the /sys/class/powercap/intel-rapl/intel-rapl:1/ subtree)

Thermald at startup reports:

$ sudo thermald --no-daemon --loglevel=debug 
[1689749970][DEBUG]RAPL sysfs present 
[1689749970][DEBUG]RAPL base path /sys/class/powercap/intel-rapl/
[1689749970][DEBUG]RAPL domain dir uevent
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/uevent/name doesn't exist
[1689749970][DEBUG]RAPL domain dir intel-rapl:1
[1689749970][DEBUG]name psys
[1689749970][DEBUG]RAPL domain dir enabled
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/enabled/name doesn't exist
[1689749970][DEBUG]RAPL domain dir power
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/power/name doesn't exist
[1689749970][DEBUG]RAPL domain dir intel-rapl:0
[1689749970][DEBUG]name package-0
[1689749970][DEBUG]RAPL base path /sys/class/powercap/intel-rapl/intel-rapl:0/
[1689749970][DEBUG]RAPL domain dir uevent
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/uevent/name doesn't exist
[1689749970][DEBUG]RAPL domain dir energy_uj
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj/name doesn't exist
[1689749970][DEBUG]RAPL domain dir intel-rapl:0:0
[1689749970][DEBUG]name core
[1689749970][DEBUG]RAPL domain dir enabled
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/enabled/name doesn't exist
[1689749970][DEBUG]RAPL domain dir constraint_1_max_power_uw
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/constraint_1_max_power_uw/name doesn't exist
[1689749970][DEBUG]RAPL domain dir constraint_2_time_window_us
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/constraint_2_time_window_us/name doesn't exist
[1689749970][DEBUG]RAPL domain dir power
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/power/name doesn't exist
[1689749970][DEBUG]RAPL domain dir device
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/device/name doesn't exist
[1689749970][DEBUG]RAPL domain dir constraint_2_power_limit_uw
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/constraint_2_power_limit_uw/name doesn't exist
[1689749970][DEBUG]RAPL domain dir constraint_1_time_window_us
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/constraint_1_time_window_us/name doesn't exist
[1689749970][DEBUG]RAPL domain dir constraint_2_max_power_uw
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/constraint_2_max_power_uw/name doesn't exist
[1689749970][DEBUG]RAPL domain dir constraint_2_name
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/constraint_2_name/name doesn't exist
[1689749970][DEBUG]RAPL domain dir constraint_1_power_limit_uw
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/constraint_1_power_limit_uw/name doesn't exist
[1689749970][DEBUG]RAPL domain dir intel-rapl:0:1
[1689749970][DEBUG]name uncore
[1689749970][DEBUG]RAPL domain dir constraint_0_time_window_us
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/constraint_0_time_window_us/name doesn't exist
[1689749970][DEBUG]RAPL domain dir subsystem
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/subsystem/name doesn't exist
[1689749970][DEBUG]RAPL domain dir constraint_1_name
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/constraint_1_name/name doesn't exist
[1689749970][DEBUG]RAPL domain dir constraint_0_power_limit_uw
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/constraint_0_power_limit_uw/name doesn't exist
[1689749970][DEBUG]RAPL domain dir constraint_0_name
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/constraint_0_name/name doesn't exist
[1689749970][DEBUG]RAPL domain dir name
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/name/name doesn't exist
[1689749970][DEBUG]RAPL domain dir constraint_0_max_power_uw
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/constraint_0_max_power_uw/name doesn't exist
[1689749970][DEBUG]RAPL domain dir max_energy_range_uj
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/intel-rapl:0/max_energy_range_uj/name doesn't exist
[1689749970][INFO]RAPL domain count 0
[1689749970][DEBUG]RAPL domain dir subsystem
[1689749970][DEBUG] /sys/class/powercap/intel-rapl/subsystem/name doesn't exist
[1689749970][INFO]RAPL domain count 1
[1689749970][MSG]32 CPUID levels; family:model:stepping 0x6:9a:3 (6:154:3)
[1689749970][WARN][/sys/devices/platform/thinkpad_acpi/dytc_lapmode] present: Thermald can't run on this platform
[1689749970][INFO]INT3400 Base path is /sys/bus/acpi/devices/INTC1041:00/physical_node/uuids/
[1689749970][INFO] failed to GET COUNT on /dev/acpi_thermal_rel
[1689749970][DEBUG]TRT count 1 ...
[1689749970][DEBUG]TRT 0: SRC TCPU:     [1689749970][DEBUG]TRT 0: TGT TCPU:     [1689749970][DEBUG]TRT 0: INF 18:       [1689749970][DEBUG]TRT 0: SMPL 50:
[1689749970][DEBUG]uuid: UNKNOWN
[1689749970][INFO]Passive 1 UUID is not present, hence ignore _TRT, as it may have junk!!
[1689749970][MSG]Config file //etc/thermald/thermal-conf.xml does not exist
[1689749970][MSG]Unsupported cpu model, use thermal-conf.xml file or run with --ignore-cpuid-check 
[1689749970][MSG]THD engine init failed
[1689749970][INFO]Running on a vanilla kernel
[1689749970][MSG]Polling mode is enabled: 4
[1689749970][INFO]Current user preference is 1
[1689749970][DEBUG]Start main loop

Subtree:

$ ls -laR /sys/class/powercap/intel-rapl/
/sys/class/powercap/intel-rapl/:
total 0
drwxr-xr-x 5 root root    0 Jun 27 22:42 .
drwxr-xr-x 4 root root    0 Jun 27 22:42 ..
-rw-r--r-- 1 root root 4096 Jul 18 13:10 enabled
drwxr-xr-x 5 root root    0 Jun 27 22:42 intel-rapl:0
drwxr-xr-x 3 root root    0 Jun 27 22:42 intel-rapl:1
drwxr-xr-x 2 root root    0 Jul  4 20:49 power
lrwxrwxrwx 1 root root    0 Jul 18 13:10 subsystem -> ../../../../class/powercap
-rw-r--r-- 1 root root 4096 Jul 18 13:10 uevent

'/sys/class/powercap/intel-rapl/intel-rapl:0':
total 0
drwxr-xr-x 5 root root    0 Jun 27 22:42 .
drwxr-xr-x 5 root root    0 Jun 27 22:42 ..
-r--r--r-- 1 root root 4096 Jul 18 13:10 constraint_0_max_power_uw
-r--r--r-- 1 root root 4096 Jul 18 13:10 constraint_0_name
-rw-r--r-- 1 root root 4096 Jul 18 13:10 constraint_0_power_limit_uw
-rw-r--r-- 1 root root 4096 Jul 18 13:10 constraint_0_time_window_us
-r--r--r-- 1 root root 4096 Jul 18 13:10 constraint_1_max_power_uw
-r--r--r-- 1 root root 4096 Jul 18 13:10 constraint_1_name
-rw-r--r-- 1 root root 4096 Jul 18 13:10 constraint_1_power_limit_uw
-rw-r--r-- 1 root root 4096 Jul 18 13:10 constraint_1_time_window_us
-r--r--r-- 1 root root 4096 Jul 18 13:10 constraint_2_max_power_uw
-r--r--r-- 1 root root 4096 Jul 18 13:10 constraint_2_name
-rw-r--r-- 1 root root 4096 Jul 18 13:10 constraint_2_power_limit_uw
-rw-r--r-- 1 root root 4096 Jul 18 13:10 constraint_2_time_window_us
lrwxrwxrwx 1 root root    0 Jul 18 13:10 device -> ../../intel-rapl
-rw-r--r-- 1 root root 4096 Jul 18 13:10 enabled
-r-------- 1 root root 4096 Jul 18 08:42 energy_uj
drwxr-xr-x 3 root root    0 Jun 27 22:42 intel-rapl:0:0
drwxr-xr-x 3 root root    0 Jun 27 22:42 intel-rapl:0:1
-r--r--r-- 1 root root 4096 Jul 18 13:10 max_energy_range_uj
-r--r--r-- 1 root root 4096 Jul 18 13:10 name
drwxr-xr-x 2 root root    0 Jun 29 09:56 power
lrwxrwxrwx 1 root root    0 Jul 18 13:10 subsystem -> ../../../../../class/powercap
-rw-r--r-- 1 root root 4096 Jul 18 13:10 uevent

'/sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:0':
total 0
drwxr-xr-x 3 root root    0 Jun 27 22:42 .
drwxr-xr-x 5 root root    0 Jun 27 22:42 ..
-r--r--r-- 1 root root 4096 Jul 19 08:03 constraint_0_max_power_uw
-r--r--r-- 1 root root 4096 Jul 19 08:03 constraint_0_name
-rw-r--r-- 1 root root 4096 Jul 19 08:03 constraint_0_power_limit_uw
-rw-r--r-- 1 root root 4096 Jul 19 08:03 constraint_0_time_window_us
lrwxrwxrwx 1 root root    0 Jul 19 08:03 device -> ../../intel-rapl:0
-rw-r--r-- 1 root root 4096 Jul 19 08:03 enabled
-r-------- 1 root root 4096 Jul 19 08:03 energy_uj
-r--r--r-- 1 root root 4096 Jul 19 08:03 max_energy_range_uj
-r--r--r-- 1 root root 4096 Jul 18 13:10 name
drwxr-xr-x 2 root root    0 Jul 19 08:03 power
lrwxrwxrwx 1 root root    0 Jun 27 22:43 subsystem -> ../../../../../../class/powercap
-rw-r--r-- 1 root root 4096 Jul 19 08:03 uevent

'/sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:0/power':
total 0
drwxr-xr-x 2 root root    0 Jul 19 08:03 .
drwxr-xr-x 3 root root    0 Jun 27 22:42 ..
-rw-r--r-- 1 root root 4096 Jul 19 08:03 async
-rw-r--r-- 1 root root 4096 Jul 19 08:03 autosuspend_delay_ms
-rw-r--r-- 1 root root 4096 Jul 19 08:03 control
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_active_kids
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_active_time
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_enabled
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_status
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_suspended_time
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_usage

'/sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:1':
total 0
drwxr-xr-x 3 root root    0 Jun 27 22:42 .
drwxr-xr-x 5 root root    0 Jun 27 22:42 ..
-r--r--r-- 1 root root 4096 Jul 19 08:03 constraint_0_max_power_uw
-r--r--r-- 1 root root 4096 Jul 19 08:03 constraint_0_name
-rw-r--r-- 1 root root 4096 Jul 19 08:03 constraint_0_power_limit_uw
-rw-r--r-- 1 root root 4096 Jul 19 08:03 constraint_0_time_window_us
lrwxrwxrwx 1 root root    0 Jul 19 08:03 device -> ../../intel-rapl:0
-rw-r--r-- 1 root root 4096 Jul 19 08:03 enabled
-r-------- 1 root root 4096 Jul 19 08:03 energy_uj
-r--r--r-- 1 root root 4096 Jul 19 08:03 max_energy_range_uj
-r--r--r-- 1 root root 4096 Jul 18 13:10 name
drwxr-xr-x 2 root root    0 Jul 19 08:03 power
lrwxrwxrwx 1 root root    0 Jun 27 22:43 subsystem -> ../../../../../../class/powercap
-rw-r--r-- 1 root root 4096 Jul 19 08:03 uevent

'/sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:1/power':
total 0
drwxr-xr-x 2 root root    0 Jul 19 08:03 .
drwxr-xr-x 3 root root    0 Jun 27 22:42 ..
-rw-r--r-- 1 root root 4096 Jul 19 08:03 async
-rw-r--r-- 1 root root 4096 Jul 19 08:03 autosuspend_delay_ms
-rw-r--r-- 1 root root 4096 Jul 19 08:03 control
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_active_kids
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_active_time
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_enabled
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_status
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_suspended_time
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_usage

'/sys/class/powercap/intel-rapl/intel-rapl:0/power':
total 0
drwxr-xr-x 2 root root    0 Jun 29 09:56 .
drwxr-xr-x 5 root root    0 Jun 27 22:42 ..
-rw-r--r-- 1 root root 4096 Jul 19 08:03 async
-rw-r--r-- 1 root root 4096 Jul 19 08:03 autosuspend_delay_ms
-rw-r--r-- 1 root root 4096 Jul 19 08:03 control
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_active_kids
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_active_time
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_enabled
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_status
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_suspended_time
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_usage

'/sys/class/powercap/intel-rapl/intel-rapl:1':
total 0
drwxr-xr-x 3 root root    0 Jun 27 22:42 .
drwxr-xr-x 5 root root    0 Jun 27 22:42 ..
-r--r--r-- 1 root root 4096 Jul 19 08:03 constraint_0_max_power_uw
-r--r--r-- 1 root root 4096 Jul 19 08:03 constraint_0_name
-rw-r--r-- 1 root root 4096 Jul 19 08:03 constraint_0_power_limit_uw
-rw-r--r-- 1 root root 4096 Jul 19 08:03 constraint_0_time_window_us
-r--r--r-- 1 root root 4096 Jul 19 08:03 constraint_1_max_power_uw
-r--r--r-- 1 root root 4096 Jul 19 08:03 constraint_1_name
-rw-r--r-- 1 root root 4096 Jul 19 08:03 constraint_1_power_limit_uw
-rw-r--r-- 1 root root 4096 Jul 19 08:03 constraint_1_time_window_us
lrwxrwxrwx 1 root root    0 Jul 19 08:03 device -> ../../intel-rapl
-rw-r--r-- 1 root root 4096 Jul 19 08:03 enabled
-r-------- 1 root root 4096 Jul 19 08:03 energy_uj
-r--r--r-- 1 root root 4096 Jul 19 08:03 max_energy_range_uj
-r--r--r-- 1 root root 4096 Jul  4 20:49 name
drwxr-xr-x 2 root root    0 Jul 19 08:03 power
lrwxrwxrwx 1 root root    0 Jun 27 22:43 subsystem -> ../../../../../class/powercap
-rw-r--r-- 1 root root 4096 Jul 19 08:03 uevent

'/sys/class/powercap/intel-rapl/intel-rapl:1/power':
total 0
drwxr-xr-x 2 root root    0 Jul 19 08:03 .
drwxr-xr-x 3 root root    0 Jun 27 22:42 ..
-rw-r--r-- 1 root root 4096 Jul 19 08:03 async
-rw-r--r-- 1 root root 4096 Jul 19 08:03 autosuspend_delay_ms
-rw-r--r-- 1 root root 4096 Jul 19 08:03 control
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_active_kids
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_active_time
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_enabled
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_status
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_suspended_time
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_usage

/sys/class/powercap/intel-rapl/power:
total 0
drwxr-xr-x 2 root root    0 Jul  4 20:49 .
drwxr-xr-x 5 root root    0 Jun 27 22:42 ..
-rw-r--r-- 1 root root 4096 Jul 19 08:03 async
-rw-r--r-- 1 root root 4096 Jul 19 08:03 autosuspend_delay_ms
-rw-r--r-- 1 root root 4096 Jul 19 08:03 control
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_active_kids
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_active_time
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_enabled
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_status
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_suspended_time
-r--r--r-- 1 root root 4096 Jul 19 08:03 runtime_usage
spandruvada commented 1 year ago

It will find the correct one, it is just traversing all folders. Please attach output of:

sudo thermald --no-daemon --loglevel=debug --adaptive --ignore-cpuid-check