powerapi-ng / hwpc-sensor

Hardware Performance Counters monitoring agent for containers.
BSD 3-Clause "New" or "Revised" License
14 stars 16 forks source link

Cannot resolve name for containers running in kubernetes #2

Open PierreRustOrange opened 4 years ago

PierreRustOrange commented 4 years ago

When running the sensor (from the official powerapi/hwpc-sensor:latest image) in a kubernetes cluster, the sensor fails to resolve the name of the containers and does not monitor any container for which it does not find a name

Log:

I: 20-06-19 13:40:37 build: version unknown (rev: 7a63055ff2fdfbfdf776d42188a87de19882dd88) (Jul 18 2019 - 11:36:29)                                                                         
 I: 20-06-19 13:40:37 uname: Linux 5.3.0-59-generic #53~18.04.1-Ubuntu SMP Thu Jun 4 14:58:26 UTC 2020 x86_64                                                                                 
 I: 20-06-19 13:40:37 pmu: found ix86arch 'Intel X86 architectural PMU' having 7 events, 7 counters (4 general, 3 fixed)                                                                      
 I: 20-06-19 13:40:37 pmu: found perf 'perf_events generic PMU' having 191 events, 0 counters (0 general, 0 fixed)                                                                            
 I: 20-06-19 13:40:37 pmu: found rapl 'Intel RAPL' having 4 events, 3 counters (0 general, 3 fixed)                                                                                           
 I: 20-06-19 13:40:37 pmu: found perf_raw 'perf_events raw PMU' having 1 events, 0 counters (0 general, 0 fixed)                                                                              
 I: 20-06-19 13:40:37 pmu: found skl 'Intel Skylake' having 83 events, 11 counters (8 general, 3 fixed)                                                                                       
 I: 20-06-19 13:40:37 pmu: found msr 'Intel MSR' having 6 events, 5 counters (0 general, 5 fixed)                                                                                             
 I: 20-06-19 13:40:37 sensor: configuration is valid, starting monitoring...                                                                                                                  
 I: 20-06-19 13:40:37 perf<all>: monitoring actor started                                                                                                                                     
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/besteffort/pod3fe59be9-5416-4c4f-bb68-664e8fe666cd/807b14d95ee42d4281bc32a3bf5461 
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/besteffort/pod729286ef-0a0b-4564-b7b5-a0c447169564/6976fcfb5cebac0470aa120715cb79 
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/besteffort/pod75d8e723-8850-41af-8486-d88a94c6d301/ec761985f956f7b0364168f7650358 
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/besteffort/pod3fe59be9-5416-4c4f-bb68-664e8fe666cd/6b9c0ffb5214ea9f49c995f0fa8e05 
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/besteffort/pod75d8e723-8850-41af-8486-d88a94c6d301/488eba3ce355088674feebb79623d9 
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/besteffort/podb3685602-8406-41a0-95ce-8774c7c59103/76a4f691e461064fae31da537c81b7 
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/besteffort/podcb5c21b3-b556-4021-bac3-19ffd875751c/914b77cca24bd2682333b9dbc602ed 
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/burstable/pode14b0994-f4da-4535-bebc-807f8969d1c2/28ce3bd812a83d6e6e21d924d8afc63 
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/besteffort/pod0e537378-aec8-4b28-8626-df2215de47de/1667851be5b15fb68da5bda264a343 
 E: 20-06-19 13:40:37 perf: failed to resolve name of target for cgroup '/sys/fs/cgroup/perf_event/kubepods/poda2fa6ea4-fddc-4654-bb2a-256606c09bd1/3591eba18ecee2f97b4839d4b975481774db0b2da 

This issues happen on a k3s-based cluster, both when using docker or containerd as a container engine.

PierreRustOrange commented 4 years ago

Actually it's not necessarily a problem if the sensor does not resolve names, as long as it keeps monitoring the containers and reports the metrics with the containerid. When running on kubernetes we need other metadata anyway (pod name, namespace, etc.) , which we can get using the K8S API and the containerid.

I'll submit a pull request to add a new option to the cli, with disable the name resolution entirely.

rouvoy commented 4 years ago

As long as it refers to a control group that can be monitored, we can report the measurement using its raw identifier.

PierreRustOrange commented 4 years ago

I totally agree. I've just submitted PR #5 for that purpose. With this it will also be possible to use the sensor image from docker hub on k8s.