powerapi-ng / hwpc-sensor

Hardware Performance Counters monitoring agent for containers.
BSD 3-Clause "New" or "Revised" License
14 stars 16 forks source link

When running the sensor inside a container, it does not produce reports on containers #12

Closed PierreRustOrange closed 2 years ago

PierreRustOrange commented 2 years ago

When running the sensor inside a container, it only generates report for the "all" and "" targets, no reports for any of containers is generated. When running the sensor with the same command outside a container (installed using the .deb package https://github.com/powerapi-ng/hwpc-sensor/releases/tag/v1.1.0 ) it works fine.

This has been tested with

PierreRustOrange commented 2 years ago

Some ideas about the origin of this issue:

When running inside a container, the file /sys/fs/cgroup/perf_event/tasks does not contain the pid of other containers running on the same system (as it does when checking outside the container):

docker run -ti --name testshell --rm --privileged  \
          -v /sys:/sys  \
          debian:buster \
          bash 

root@0af4cff01ada:/# cat  /sys/fs/cgroup/perf_event/tasks | wc -l  
0

When checking the same file outside a container, it's actually far from empty:

cat  /sys/fs/cgroup/perf_event/tasks | wc -l
3120

It seems that something has changed recently, that restricts visibility of the tasks inside a container. This could come from the kernel or docker, we should check with older versions of these.

PierreRustOrange commented 2 years ago

After investigation, setting the PID namespace to be used by the container solves this issue : --pid host.

For example, the following command works :

docker run --privileged --rm --name sensorhwpc --network="host" --pid host \
   -v /sys:/sys  \
   -v /var/lib/docker/containers:/var/lib/docker/containers:ro     \
   powerapi/hwpc-sensor \
     -n sensor \
     -f 2000 \
     -r socket -U 127.0.0.1 -P  12000 \
     -s "rapl" -o -e "RAPL_ENERGY_PKG" \
     -s "msr"     -e "TSC" -e "APERF" -e "MPERF" \
     -c "core"    -e "CPU_CLK_THREAD_UNHALTED:REF_P" \
                  -e "CPU_CLK_THREAD_UNHALTED:THREAD_P" \
                  -e "LLC_MISSES"\
                  -e "INSTRUCTIONS_RETIRED"