lxc / lxcfs

FUSE filesystem for LXC
https://linuxcontainers.org/lxcfs
Other
1.02k stars 246 forks source link

Regression in CPU utilization virtualization #538

Open webdock-io opened 2 years ago

webdock-io commented 2 years ago

LXD v 5.0.0 Ubuntu Jammy 5.15.0-27-generic

Create a Ubuntu container and stress all your CPUs on the host system, e.g. stress -c 72 enter the container and run htop and you will see the system reports CPUs utilization at 100% across all CPUs. Load avg. reporting is OK at ~0

This seems to be some regression as on Focal 5.13.0-35-generic systems running v4.19 CPU utilization is correctly reported as ~0% across all threads inside the container.

webdock-io commented 2 years ago

OK this seems to be something connected to the HWE kernel that ships with Jammy or some change in Jammy in general, although please see questions below.

I did a pretty hardcore nuke of the 5.0 install:

snap remove --purge lxd
apt purge snapd && apt install snapd
reboot

Then snap install --channel=4.0/stable after reboot and I still see this issue present. I did another reboot after installing 4.0.9 for good measure and still see the issue.

Although, I can't find a way for me to check which version of lxcfs is actually running, so I can't be sure lxcfs was really being downgraded as well - how to check this?

Also, despite the hardcore nuke seen above, I still saw some old data hanging around - for example doing lxc remote list showed my old remotes still present! How is that even possible? Isn't that supposed to be stored in the database, which almost certainly should have been nuked during the uninstall??

@stgraber Got any input on all this? Thanks :) I would like my testing to be valid and actually revert to an older version of lxcfs in order to determine for sure whether this is an lxcfs regression or something has changed in newer kernels/Jammy which causes this.

If I don't get any response here, I guess my next test is to start over with my system and install Focal instead of Jammy and see if stuff works with lxd v.5.0 which would then help determine if this is some change in Jammy which lxcfs is not handling or a regression as I initially thought.

webdock-io commented 2 years ago

All right some more testing later and the results are in:

On Ubuntu Focal 20.04, LXD 5.0 both with stock kernel and after HWE kernel installation (5.13.0-40-generic) the issue is not present

So this is definitely some change in Jammy and/or kernels newer than 5.13.0-40

I have a bunch of systems deployed in a datacenter which I am unable to go live with due to this bug, as customers will surely complain, so my only choice here is to do a full reinstall over KVM to Focal - which is exceedingly slow and painful - unless I can get some hints on how to track this this down and/or fix this :)

I'll give it a day or two to see if anybody here wakes up and gives me some pointers before diving into that particular madness. Thanks!

tomponline commented 2 years ago

Have you tried booting with unified_cgroup_hierarchy=0 as a kernel boot argument to see if its cgroupv2 related?

webdock-io commented 2 years ago

@tomponline Thank you for chiming in and for providing this hint.

I tried adding this to /etc/default/grub and running update--grub where:

GRUB_CMDLINE_LINUX_DEFAULT="swapaccount=1 init_on_alloc=0 systemd.unified_cgroup_hierarchy=0"

After performing this change and a reboot, the problem has gone away!

Note for the curious: swapaccount is lxd related setting we need for our setup and init_on_alloc is a zfs related optimization.

Followup questions:

Although this solves my immediate problem, is it an issue for me moving forward having done this? cgroupv2 seems like a good thing and is what you support moving forward I'm guessing ...

But I guess I could always switch back to using cgroupv2 after lxcfs is fixed, or is that naive of me?

tomponline commented 2 years ago

I suspect this is a problem with LXCFS in pure cgroupv2 environments which needs fixing. Yes you can switch back to cgroupv2 later once its fixed.

webdock-io commented 2 years ago

Great thank you for the update - we have already downgraded all affected systems. I'll be back to test this once a fix has been implemented :)

varuzam commented 1 year ago

I have same issue in debian 11 with lxcfs 5.0.1. Switching to cgroupv1 fixed it. Looking forward for proper cgroupv2 support

stgraber commented 1 year ago

@brauner interested in looking into this one?

salanki commented 1 year ago

Getting this fixed would be very nice

lflare commented 1 year ago

Chiming in that this is still experienced in v5.0.3.

mihalicyn commented 1 year ago

That's not a LXCFS bug, the problem is that currently cpuset cgroup controller is not enabled by default. But it's required to properly "virtualize" a CPU stats inside the LXC container: https://github.com/lxc/lxcfs/blob/cd2e3ac5c5ae4fde5b58380ac2f56e55c78e41cc/src/proc_cpuview.c#L1136

I'll put this in my ToDo to looks how to properly enable this controller with LXD.

-- Upd.

The cgroup-v1 cpuacct and cpu were replaced by the one cpu controller in cgroup-v2. Unfortunately, cgroup-v2 cpu controller doesn't provides us with an analog for cpuacct.usage_all file. There is cpu.stat file, but it gives us only aggregated stat times (sum).

So, that's a kernel limitation. Nothing can be done here from the LXCFS side.

cc @stgraber

webdock-io commented 2 months ago

Ping +2 years later

Now that cgroupv2 is an inevitable fact and will fully replace v1, where upcoming systemd will even refuse to boot under cgroupv1 apparently, I feel this issue needs to be revisited and has become more pressing.

Would it be a matter of putting in a request with whomever maintains cgroupv2 to resurrect cpuacct.usage_all. and then you'd be able to implement this in lxcfs ?

We really depend on this functionality to provide accurate cpu utilization metrics to container customers and are motivated to get this moving in the right direction. If you could point us to where we can raise this issue or provide financial motivation, even, to implement this metric that's needed in cgroupv2, that would be much appreciated.

stgraber commented 2 months ago

@mihalicyn ^