powerapi-ng / hwpc-sensor

Hardware Performance Counters monitoring agent for containers.
BSD 3-Clause "New" or "Revised" License
14 stars 16 forks source link

No data collected in 'core'-group #44

Closed TheMehli closed 10 months ago

TheMehli commented 10 months ago

Hello,

I am trying to set up the HWPC Sensor with SmartWatts. As a first measure, I have tried to set both up according to the PowerAPI documentation. However the output of the HWPC Sensor doesn't correspond to what I expect. Data from the 'rapl' and 'msr' groups seem to be collected and passed on to MongoDB just fine, but there is no data from the 'core' group.

This is my config file for the Sensor:

{
  "name": "sensor",
  "verbose": true,
  "frequency": 500,
  "output": {
    "type": "mongodb",
    "uri": "mongodb://127.0.0.1",
    "database": "mongo_destination",
    "collection": "report_0"
  },
  "system": {
    "rapl": {
      "events": ["RAPL_ENERGY_PKG"],
      "monitoring_type": "MONITOR_ALL_CPU_PER_SOCKET"
    },
    "msr": {
      "events": ["TSC", "APERF", "MPERF"]
    }
  },
  "container": {
    "core": {
      "events": [
        "CPU_CLK_UNHALTED:REF_P",
    "CPU_CLK_UNHALTED:THREAD_P",
        "LLC_MISSES",
    "INSTRUCTIONS_RETIRED"
      ]
    }
  }
}

My output corresponds to the example given here: https://powerapi.org/reference/reports/reports/#hwpc-report But I am only getting the second timestamp with the 'rapl' and 'msr' groups.

I am running on Debian 12 with an Intel i5-10210U.

Is there a way I can get this data? Since SmartWatts doesn't generate reports and I suspect it's because of these missing datapoints.

Thank you.

roda82 commented 10 months ago

Hello,

Can you provide us your configuration file for SmartWatts?

Regards,

TheMehli commented 10 months ago

Hello,

sure:

{
  "verbose": true,
  "stream": true,
  "input": {
    "puller": {
      "model": "HWPCReport",
      "type": "mongodb",
      "uri": "mongodb://127.0.0.1",
      "db": "mongo_destination",
      "collection": "report_0"
    }
  },
  "output": {
    "pusher_power": {
      "type": "influxdb2",
      "uri": "127.0.0.1",
      "port": 8086,
      "db": "test_results",
      "org": "admin",
      "token": "myToken"
    }
  },
  "cpu-base-freq": 1600,
  "cpu-error-threshold": 2.0,
  "disable-dram-formula": true,
  "sensor-reports-frequency": 500
}
roda82 commented 10 months ago

Thank you for the configuration. Can you indicate PowerReport as model in the pusher_power output and test again ?

TheMehli commented 10 months ago

Thank you, this seems to indeed have done the trick. However, I am still confused as to why I seem to not be getting the data from the core group. Is this expected behavior and does that change the accuracy of my Power Reports?

roda82 commented 10 months ago

I just installed Debian 12 on an Intel i7-10610U, which belongs to the same processor family as yours. I'm using a kernel version 6.1.0-13-amd64. I am able to see core group values for targets different to all. Did you create a cgroup and associate the process that you want to monitor ? Can you please show the logs of the sensor ?

roda82 commented 10 months ago

And can you confirm us that you activated cgroup V1 ?

TheMehli commented 10 months ago

I was not aware, that I had to activate cgroup V1 in order to get core group values. I am now getting core group values for target all and all running docker containers, without associating any processes with a cgroup. Thank you. In case you are still interested in the sensor log, here it is:

I: 23-12-01 20:42:50 build: version unknown (rev: unknown)
I: 23-12-01 20:42:50 uname: Linux 6.5.0-0.deb12.1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.5.3-1~bpo12+1 (2023-10-08) x86_64
I: 23-12-01 20:42:50 pmu: found ix86arch 'Intel X86 architectural PMU' having 7 events, 7 counters (4 general, 3 fixed)
I: 23-12-01 20:42:50 pmu: found perf 'perf_events generic PMU' having 210 events, 0 counters (0 general, 0 fixed)
I: 23-12-01 20:42:50 pmu: found rapl 'Intel RAPL' having 4 events, 3 counters (0 general, 3 fixed)
I: 23-12-01 20:42:50 pmu: found perf_raw 'perf_events raw PMU' having 1 events, 0 counters (0 general, 0 fixed)
I: 23-12-01 20:42:50 pmu: found skl 'Intel Skylake' having 84 events, 11 counters (8 general, 3 fixed)
I: 23-12-01 20:42:50 pmu: found intel_msr 'Intel MSR' having 6 events, 6 counters (0 general, 6 fixed)
I: 23-12-01 20:42:50 sensor: configuration is valid, starting monitoring...
I: 23-12-01 20:42:50 perf<all>: monitoring actor started
I: 23-12-01 20:42:50 perf<mongo_destination>: monitoring actor started
I: 23-12-01 20:42:50 perf<influx_dest>: monitoring actor started
I: 23-12-01 20:42:50 perf<dazzling_jones>: monitoring actor started