Missing ebpf readings on rt kernels

rootfs commented 11 months ago

What happened?

I am running kepler on an OCP 4.12 setup that runs real time kernel from @novacain1.

For some reason, both ebpf cpu time and perf counter metrics are zeroes:

$ kubectl exec -ti -n openshift-kepler-operator daemonset/kepler-exporter-ds -c kepler-exporter -- bash -c "curl localhost:9103/metrics|grep kepler_container_bpf_cpu " |sort -k 2  -g   |tail -5 
kepler_container_bpf_cpu_time_us_total{container_id="f69936729ef1e9b92ff0b4546aa16abfb82dcdfeaf788d63671c4302633884da",container_name="route-controller-manager",container_namespace="openshift-route-controller-manager",pod_name="route-controller-manager-5ccb884c69-kjwrn"} 0
kepler_container_bpf_cpu_time_us_total{container_id="fb411c2422673d47b3beeed553a7f07cf66303ae281acb61196a1008dbe40b0f",container_name="kube-scheduler-operator-container",container_namespace="openshift-kube-scheduler-operator",pod_name="openshift-kube-scheduler-operator-595c64c4f5-kgcnv"} 0
kepler_container_bpf_cpu_time_us_total{container_id="fe814e428015a8735698276794402585ed624bc8f9e60643cfa4654bc4906bad",container_name="whereabouts-cni-bincopy",container_namespace="openshift-multus",pod_name="multus-additional-cni-plugins-5dwcs"} 0
kepler_container_bpf_cpu_time_us_total{container_id="system_processes",container_name="system_processes",container_namespace="system",pod_name="system_processes"} 0

$ kubectl exec -ti -n openshift-kepler-operator daemonset/kepler-exporter-ds -c kepler-exporter -- bash -c "curl localhost:9103/metrics|grep kepler_container_cpu " |sort -k 2  -g   |tail -5 
kepler_container_cpu_instructions_total{command="webhook",container_id="625826d08d6a8c21c7dc482c55333ff5f47ada16d2f9c0ec3430770d68d37aee",container_name="multus-admission-controller",container_namespace="openshift-multus",pod_name="multus-admission-controller-5b449c6757-s2kf8"} 0
kepler_container_cpu_instructions_total{command="webhook",container_id="c8e40caf576af35dac4d026d291985e51312fca9bdf44f7467229ae9517d8414",container_name="webhook-server",container_namespace="openshift-sriov-network-operator",pod_name="network-resources-injector-8k4t5"} 0
kepler_container_cpu_instructions_total{command="webhook",container_id="ee43b7ce2cfb429a3c6751fcb721008318490aa65534650e69d09909bb711dc5",container_name="webhook-server",container_namespace="openshift-sriov-network-operator",pod_name="operator-webhook-lpjsf"} 0
kepler_container_cpu_instructions_total{command="work",container_id="961694e1e04e6b2b316f45d9c51149f5e6741cf47654188bc5d06d25a0a148c6",container_name="klusterlet-manifestwork-agent",container_namespace="open-cluster-management-agent",pod_name="klusterlet-work-agent-6d88cf58b7-d796k"} 0

But perf stat does show some results:

# perf stat -e cycles,cache-misses sleep 1

 Performance counter stats for 'sleep 1':

         1,856,335      cycles                                                      
            14,056      cache-misses                                                

       1.002105863 seconds time elapsed

       0.000995000 seconds user
       0.000995000 seconds sys

As a workaround, I use kubelet_cpu_usage as CORE_USAGE_METRIC in kepler configmap, i.e. adding

CORE_USAGE_METRIC: kubelet_cpu_usage

What did you expect to happen?

bpf stats should be non zeroes

How can we reproduce it (as minimally and precisely as possible)?

discovered on a bm OCP 4.12 setup that runs real time kernel

Anything else we need to know?

No response

Kepler image tag

latest and release-0.5.5

Kubernetes version

```console OCP 4.12 ```

Cloud provider or bare metal

baremetal

OS version

```console # On Linux: $ cat /etc/os-release # paste output here $ uname -a # paste output here # On Windows: C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture # paste output here ```

Install tools

Kepler deployment config

For on kubernetes: ```console $ KEPLER_NAMESPACE=kepler # provide kepler configmap $ kubectl get configmap kepler-cfm -n ${KEPLER_NAMESPACE} # paste output here # provide kepler deployment description $ kubectl describe deployment kepler-exporter -n ${KEPLER_NAMESPACE} ``` For standalone: # put your Kepler command argument here

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

sunya-ch commented 11 months ago

Is it the problem of BPF probe? Could you also share beginning of the log? Also, try grep active processes in the log (need log level 3).

I also suspect security requirement for OCP.

rootfs commented 11 months ago

thanks @sunya-ch here is the log https://pastebin.com/GGuxjutk

rootfs commented 11 months ago

perf stat can also read cpu counters from running processes

# perf stat -e cycles,cache-misses,instructions -p 47302
^C
 Performance counter stats for process id '47302':

     1,697,891,077      cycles                                                      
        10,431,678      cache-misses                                                
     1,555,348,029      instructions              #    0.92  insn per cycle         

       2.122812892 seconds time elapsed

rootfs commented 11 months ago

running perf inside of the kepler pod didn't work

[root@kepler-exporter-ngh4b /]# perf stat -e cycles,instructions,cache-misses -p 47302 -a
PID/TID switch overriding SYSTEM
WARNING: Ignored open failure for pid 47302
WARNING: Ignored open failure for pid 47661
WARNING: Ignored open failure for pid 47662
WARNING: Ignored open failure for pid 47667
WARNING: Ignored open failure for pid 47760
WARNING: Ignored open failure for pid 47761
Error:
The sys_perf_event_open() syscall returned with 3 (No such process) for event (cycles).
/bin/dmesg | grep -i perf may provide additional information.

rootfs commented 11 months ago

after @novacain1 turning off rt kernel, the bpf metrics are back again. So this is specific to rt kernels.

novacain1 commented 11 months ago

One way on OpenShift you could try @rootfs is to use a debug shell (with cluster-admin, which you have with the kubeconfig), but specify a different image where it is relatively easy to install packages:

oc debug node/hostname.openshift.lab --image=quay.io/fedora/fedora:38

Temporary namespace openshift-debug-qv5wq is created for debugging node...
Starting pod/hostnameopenshiftlab-debug ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.38.140
If you don't see a command prompt, try pressing enter.

sh-5.2# dnf install perf bpftool bpftrace

perf looks to launch here, for me at least.

marceloamaral commented 11 months ago

@rootfs, can you execute any eBPF program on the host when the RT kernel is enabled?

I've saw that you can run perf on the host, but it doesn't work within the Kepler container. It's possible that the RT kernel requires additional configurations to be exposed within the Kepler containers.

marceloamaral commented 11 months ago

as @rootfs pointed before, https://lwn.net/Articles/802884/, ebpf on RT kernel seems to be enabled for kernel >= 5.3.

rootfs commented 11 months ago

dup of #973, closing this one

sustainable-computing-io / kepler