influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.63k stars 5.58k forks source link

Metrics missing from intel_powerstat #13098

Closed azw71 closed 11 months ago

azw71 commented 1 year ago

Relevant telegraf.conf

[[inputs.intel_powerstat]]
  package_metrics = ["current_power_consumption", "current_dram_power_consumption", "thermal_design_power", "max_turbo_frequency", "cpu_base_frequency"]
  cpu_metrics = ["cpu_frequency", "cpu_busy_frequency", "cpu_temperature", "cpu_c0_state_residency", "cpu_c1_state_residency", "cpu_c6_state_residency"]

Logs from Telegraf

2023-04-17T13:06:13Z I! Loading config file: /etc/telegraf/telegraf.conf
2023-04-17T13:06:13Z I! Starting Telegraf 1.26.1
2023-04-17T13:06:13Z I! Available plugins: 235 inputs, 9 aggregators, 27 processors, 22 parsers, 57 outputs, 2 secret-stores
2023-04-17T13:06:13Z I! Loaded inputs: cpu disk diskio exec intel_powerstat kernel mem net processes sensors smart swap system
2023-04-17T13:06:13Z I! Loaded aggregators:
2023-04-17T13:06:13Z I! Loaded processors:
2023-04-17T13:06:13Z I! Loaded secretstores:
2023-04-17T13:06:13Z W! Outputs are not used in testing mode!
2023-04-17T13:06:13Z I! Tags enabled: host=fatblock
2023-04-17T13:06:13Z D! [agent] Initializing plugins
2023-04-17T13:06:13Z D! [agent] Starting service inputs
> powerstat_package,active_cores=1,host=fatblock,package_id=0 max_turbo_frequency_mhz=4200i 1681736773000000000
> powerstat_package,active_cores=2,host=fatblock,package_id=0 max_turbo_frequency_mhz=4100i 1681736773000000000
> powerstat_package,active_cores=3,host=fatblock,package_id=0 max_turbo_frequency_mhz=4100i 1681736773000000000
> powerstat_package,active_cores=4,host=fatblock,package_id=0 max_turbo_frequency_mhz=4000i 1681736773000000000
> powerstat_package,host=fatblock,package_id=0 cpu_base_frequency_mhz=3600i 1681736773000000000
> powerstat_package,host=fatblock,package_id=0 thermal_design_power_watts=65 1681736773000000000
> powerstat_core,core_id=2,cpu_id=2,host=fatblock,package_id=0 cpu_frequency_mhz=4012.05 1681736773000000000
> powerstat_core,core_id=0,cpu_id=0,host=fatblock,package_id=0 cpu_frequency_mhz=4012.04 1681736773000000000
> powerstat_core,core_id=2,cpu_id=2,host=fatblock,package_id=0 cpu_temperature_celsius=42i 1681736773000000000
> powerstat_core,core_id=3,cpu_id=3,host=fatblock,package_id=0 cpu_frequency_mhz=4000 1681736773000000000
> powerstat_core,core_id=1,cpu_id=1,host=fatblock,package_id=0 cpu_frequency_mhz=4012.52 1681736773000000000
> powerstat_core,core_id=0,cpu_id=0,host=fatblock,package_id=0 cpu_temperature_celsius=38i 1681736773000000000
> powerstat_core,core_id=3,cpu_id=3,host=fatblock,package_id=0 cpu_temperature_celsius=38i 1681736773000000000
> powerstat_core,core_id=1,cpu_id=1,host=fatblock,package_id=0 cpu_temperature_celsius=37i 1681736773000000000
2023-04-17T13:06:13Z D! [agent] Stopping service inputs
2023-04-17T13:06:13Z D! [agent] Input channel closed
2023-04-17T13:06:13Z D! [agent] Stopped Successfully

System info

Telegraf 1.26.1, Debian Bullseye with Kernel 6.1

Docker

No response

Steps to reproduce

  1. run telegraf with powerstat plugin
  2. ...

Expected behavior

powerstat plugin should deliver power consumption metrics and information about c0/c1/c6 residency

Actual behavior

Metrics are missing without additional info or error

Additional info

Please document supported CPUs for uncore metrics, see https://github.com/torvalds/linux/blob/master/drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c

cat /proc/cpuinfo

processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 158 model name : Intel(R) Core(TM) i3-9100 CPU @ 3.60GHz stepping : 11 microcode : 0xf0 cpu MHz : 1100.005 cache size : 6144 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities vmx flags : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple pml ept_mode_based_exec ...

azw71 commented 1 year ago

` andi@fatblock:/tmp$ lsmod | egrep -i "msr|rapl"

msr 16384 0 intel_rapl_msr 20480 0 intel_rapl_common 32768 1 intel_rapl_msr rapl 20480 0

andi@fatblock:/tmp$ uname -a Linux fatblock 6.1.0-0.deb11.5-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.12-1~bpo11+1 (2023-03-05) x86_64 GNU/Linux `

p-zak commented 1 year ago

powerstat plugin should deliver power consumption metrics and information about c0/c1/c6 residency

@azw71 All metrics collected by Intel PowerStat plugin are collected in fixed intervals. Metrics that reports processor C-state residency or power are calculated over elapsed intervals. When starting to measure metrics, plugin skips first iteration of metrics if they are based on deltas with previous value. (https://github.com/influxdata/telegraf/tree/master/plugins/inputs/intel_powerstat#metrics)

I see that you run Telegraf in testing mode which means that only one iteration of gathering metrics was run. Please, run Telegraf in normal mode and see if all metrics are gathered properly.

Please document supported CPUs for uncore metrics, see https://github.com/torvalds/linux/blob/master/drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c

I believe that uncore metrics are documented in https://github.com/influxdata/telegraf/blob/master/plugins/inputs/intel_powerstat/README.md Please, let me know if something is missing.

azw71 commented 1 year ago

Thanks for your explanation, the missing measured values are actually available after restarting Telegraf.

My note regarding the documentation refers to the fact that the intel_uncore_frequency module cannot be used on my CPU, there is a message "no such device" when starting with modprobe. A few xeon CPUs are listed in the linked kernel source code that differ from the CPUs mentioned in the powerstat module.

p-zak commented 1 year ago

Yes, you are right.

intel-uncore-frequency module can be only loaded for these models:

Model number Processor name
0x55 Intel Skylake-X
0x6A Intel IceLake-X
0x6C Intel IceLake-D
0x47 Intel Broadwell-G
0x4F Intel Broadwell-X
0x56 Intel Broadwell-D
0x8F Intel Sapphire Rapids X
0xCF Intel Emerald Rapids X

Plugin will be updated in the upcoming months, this information will be put to README and/or code.

Thanks for findings this!

powersj commented 11 months ago

@p-zak given the user's issue was resolved after not running in test mode, is this issue left open to document the models supported?

p-zak commented 11 months ago

@powersj Exactly, it can be closed after changes to README are delivered.