Open rszmigiel opened 8 months ago
AFAIK CPU isolation removes a set of CPUs from scheduling algorithm of kernel. Kepler adds a probe to kernel's sched_switch
tracepoint to calculate how much cpu time/cpu cycles a process is using, and attributes power usage based on cpu time/cpu cycles of the process.
So if some process is using a CPU which is outside of scheduler, then the probe may not be running for that cpu, Kepler may not be knowing the process's cpu time/cycles to assign any power usage to it, and may not be generating metrics for it.
Cc: @rootfs @marceloamaral
In such case could we obtain power usage metrics using alternative ways, even they're not as much detailed as in case of the eBPF use? For an instance, to workaround the issue mentioned in this case I used output from ipmitool sdr
command. It provides summarised power usage across all CPUs and memory installed in the system - still better this than nothing ;-)
@rszmigiel would you please use the kepler 0.7.2 container image?
cc @vprashar2929 @sthaha
I've used kepler-operator-bundle:0.10.0 and it works!
Thank you!
great news! thanks for the update @rszmigiel
@rootfs i am really curious to know why it worked with libbpf
but not with bcc
. thats the only difference between two kepler versions.
the approach to calculate the cpu cycles is same in both.
It seems it's still happening with latest available version (left side of the graph), compared to 0.7.2 (reinstalled, on the right side)
Reopening issue to continue investigation.
I tried to reproduce this scenario. In a machine with 20 cores, i isolated 2 cores and executed stress-ng on these isolated cores. Kepler is able to get energy usage for these processes.
since the cores are isolated, any task started without cpu pinning will not be allocated to the isolated cores. in this case the cpu 2 and 12 will not be loaded
Cc: @iconeb PTAL
I confirm we have a performance profile with reserved and isolated cpus
# oc get performanceprofile upf-performance-profile -o json | jq -r .spec.cpu
{
"isolated": "2-31,34-63,66-95,98-127",
"reserved": "0-1,32-33,64-65,96-97"
}
They are correctly applied at worker node's boot
# oc debug node/compute-0.0x4954.openshift.one -- cat /proc/cmdline
[...] intel_iommu=on iommu=pt systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller=1 skew_tick=1 tsc=reliable rcupdate.rcu_normal_after_boot=1 nohz=on rcu_nocbs=2-31,34-63,66-95,98-127 tuned.non_isolcpus=00000003,00000003,00000003,00000003 systemd.cpu_affinity=0,1,32,64,33,65,96,97 intel_iommu=on iommu=pt isolcpus=managed_irq,2-31,34-63,66-95,98-127 nohz_full=2-31,34-63,66-95,98-127 nosoftlockup nmi_watchdog=0 mce=off rcutree.kthread_prio=11 default_hugepagesz=1G hugepagesz=1G hugepages=200 idle=poll rcu_nocb_poll tsc=perfect selinux=0 enforcing=0 noswap clock=pit audit=0 processor.max_cstate=1 intel_idle.max_cstate=0 rcupdate.rcu_normal_after_boot=0 softlockup_panic=0 console=ttyS0,115200n8 pcie_aspm=off pci=noaer firmware_class.path=/var/lib/firmware intel_pstate=disable
Pod is running with requests and limits
$ oc get pod upf1 -o json | jq .spec.containers[0].resources
{
"limits": {
"cpu": "18",
"hugepages-1Gi": "40Gi",
"memory": "30Gi",
"openshift.io/ens785_rn": "3"
},
"requests": {
"cpu": "18",
"hugepages-1Gi": "40Gi",
"memory": "30Gi",
"openshift.io/ens785_rn": "3"
}
}
And on the worker node taskset affinity is assigned as expected
taskset -pc 656722
pid 656722's current affinity list: 3-7,22-25,67-71,86-89
The strange thing is that previous graph was created running the same pod(s) on the same environment, just changing kepler's version in the meantime.
I will try another round of test to provide (if possible) further evidence
I have tested Kepler on RHEL that started with isolated CPUs. The isolated CPUs were assigned to a VM. Kepler can capture the VM and report metrics. We have added this configuration in our CI.
What happened?
I'm running PoC with OpenShift 4.13, Kepler 0.9.2 installed with Kepler (Community) Operator. One of the use-cases is to visualise energy consumption of DPDK enabled containers. These containers are using isolated CPU cores on a SingleNodeOpenShift installation.
I got CPUs isolated with the following PerformanceProfile:
I also got workload partitioning configured (https://docs.openshift.com/container-platform/4.13/scalability_and_performance/enabling-workload-partitioning.html).
While I run a Pod that's configured to use isolated CPU cores, for an instance:
and then run a sample workload to put some load on these cores, for an instance:
I can observe that assigned CPU cores shows high usage in top output:
but Kepler's power usage diagrams don't reflect that - they're very flat:
However, if I run the same Pod but on shared (non-isolated) CPU cores by removing whole resources.requests and resources.limits sections, the Kepler graphs looks much more reasonable:
even the workload is running on small portion of non-isolated CPU cores:
Therefore I conclude that Kepler does not show proper power usage when isolated CPU cores are being used.
What did you expect to happen?
I'd like to see energy usage for isolated and non-isolated CPU cores. This is very important for all high throughput, low latency workloads.
How can we reproduce it (as minimally and precisely as possible)?
Get OpenShift 4.13 with Kepler Community Operator installed, configure node to run isolated cpu cores and workloads isolation. Run two pods, one using isolated CPU cores, one using shared CPU cores. Observe that energy usage metrics are being collected only for shared (non-isolated) CPU cores.
Anything else we need to know?
No response
Kepler image tag
Kubernetes version
Cloud provider or bare metal
OS version
Install tools
Kepler deployment config
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)