powerapi-ng / hwpc-sensor

Hardware Performance Counters monitoring agent for containers.
BSD 3-Clause "New" or "Revised" License
14 stars 16 forks source link

CGroup v2 issue #29

Closed MaxiStefan closed 1 year ago

MaxiStefan commented 1 year ago

Hi,

My distribution has cgroup v2 installed and I tried to follow the instructions from this issue:

Additionally, I tried to follow this to manually manage cgroups with cgroupfs, however, I seem to have errors when editing the cgroup.subtree_control file to add perf event controller

For reference, the only controllers I have active are: cpuset cpu io memory hugetlb pids rdma misc (cat /sys/fs/cgroup/cgroup.controllers)

Any recommendation for making this work on a Ubuntu 22.04 with a 5.15.0-58-generic kernel version? I am simply trying to use smartwatts and the sensor to measure some container level energy consumption on my machine...no kubernetes cluster or anything on top

qperez commented 1 year ago

Hi,

I have the same issue as @MaxiStefan on my system. I can't have perf_event in the /sys/fs/cgroup despite the same workarounds described in the issue before. My cgroup.controllers returns the following active elements: cpuset cpu io memory hugetlb pids rdma misc .

My platform is Ubuntu 22.04 LTS - Linux 5.15.0-58-generic and in /boot/config-5.15.0-58-generic the option CONFIG_CGROUP_PERF is enabled (CONFIG_CGROUP_PERF=y)

gfieni commented 1 year ago

Hello, For both cases, the kernel parameter systemd.unified_cgroup_hierarchy=0 seems not effective. You shouldn't have the cgroup.controllers file on a cgroup v1 hierarchy because the supported controllers are folders in /sys/fs/cgroup, like /sys/fs/cgroup/perf_event for example.

Maybe you haven't regenerated the grub configuration ? (sudo update-grub) Can you please post the content of the /proc/cmdline file ?

qperez commented 1 year ago

Thank you @gfieni for your quick reply. I have just forgotten to regenerate my grub configuration... Basic mistake, sorry :sweat_smile: ... Now I have the perf_event in /sys/fs/cgroup and I can create specific cgroups for measurements. Thanks you a lot ! :thumbsup:

MaxiStefan commented 1 year ago

Hi,

Thanks for getting back to us. I did regenetare my grub config (prior to making this post) with no success.

cat /proc/cmdline returned the following:

BOOT_IMAGE=/boot/vmlinuz-5.15.0-58-generic root=UUID=f3a6250d-896d-440a-9d53-18f61dffbebd ro quiet splash

MaxiStefan commented 1 year ago

In the meantime...on another machine I installed Ubuntu 18 and although this mentions that linux kernel 4.15 should have arch/x86/events/intel/rapl.c, if I run perf list | grep power/ my output is empty.

So on older kernels with v1 cgroup I cannot seem to have access to RAPL events and on my new Ubuntu version with 5.15 kernel I cannot create a perf_controller but I have access to RAPL events.

gfieni commented 1 year ago

Hi,

Thanks for getting back to us. I did regenetare my grub config (prior to making this post) with no success.

cat /proc/cmdline returned the following:

BOOT_IMAGE=/boot/vmlinuz-5.15.0-58-generic root=UUID=f3a6250d-896d-440a-9d53-18f61dffbebd ro quiet splash

Hello, As we can see from the content of your /proc/cmdline, the systemd.unified_cgroup_hierarchy=0 kernel parameter is not effective for your configuration. Did you put the parameter in the /etc/default/grub file and regenerated the grub configuration with sudo grub-update ?

In the meantime...on another machine I installed Ubuntu 18 and although this mentions that linux kernel 4.15 should have arch/x86/events/intel/rapl.c, if I run perf list | grep power/ my output is empty.

So on older kernels with v1 cgroup I cannot seem to have access to RAPL events and on my new Ubuntu version with 5.15 kernel I cannot create a perf_controller but I have access to RAPL events.

For Ubuntu kernels, the RAPL module is inside the linux-modules-extra package of your kernel version. You can install it with the following commands :

apt install linux-modules-extra-$(uname -r)
update-initramfs -c -k $(uname -r)
MaxiStefan commented 1 year ago

Hi, I have run the commands regarding the linux-modules-extra package. Are there any subsequent steps that are in order (eg. reboot?) This is the output I get image

Thanks for the quick reply. This is the content of my grub config

#If you change this file, run 'update-grub' afterwards to update
GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=10
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0"
GRUB_CMDLINE_LINUX=""

Thanks in advance!

gfieni commented 1 year ago

Hi, I have run the commands regarding the linux-modules-extra package. Are there any subsequent steps that are in order (eg. reboot?) This is the output I get image

You can check if the module is correctly installed with the modinfo rapl command. You can always reboot the machine to be sure.

Thanks for the quick reply. This is the content of my grub config

#If you change this file, run 'update-grub' afterwards to update
GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=10
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0"
GRUB_CMDLINE_LINUX=""

Thanks in advance!

Is the content of this file from /etc/default/grub ? Maybe try to put the parameter at the end of the GRUB_CMDLINE_LINUX_DEFAULT and regenerate your grub config ? Both parameters of this variable are present in your /proc/cmdline.

MaxiStefan commented 1 year ago

So I did change the parameter in the grub file and it now seems like i have the cgroup v1 active and managed to create a new perf_event:cgroup_name. I was a bit cautious to change anything in grub that was not mentioned in the documentation I followed for adding GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0".

However, just in case I will encounter any errors in the future... I have also rebooted my ubuntu machine with kernel 4.15 after sudo update-initramfs -c -k $(uname -r) and modinfo rapl gives modinfo: ERROR: Module rapl not found. If you happen to have any ideas and some extra time to help me figure out why that might be the case, I would appreciate it immensely. It would be amazing if I could also make this machine work...Otherwise I will simply continue with the other version and test if I have any readings from my individual containers.

gfieni commented 1 year ago

Are you sure the linux-modules-extra-$(uname -r) package is installed ? (dpkg -l |grep linux-modules-extra) The modinfo shouldn't return an error if the module is available.

MaxiStefan commented 1 year ago

ii linux-modules-extra-4.15.0-202-generic 4.15.0-202.213 amd64 Linux kernel extra modules for version 4.15.0 on 64 bit x86 SMP this is what it returns