opencontainers / runc

CLI tool for spawning and running containers according to the OCI specification
https://www.opencontainers.org/
Apache License 2.0
11.58k stars 2.06k forks source link

runc update: cpu period and cpu burst being reset to defaults after an unrelated update #4225

Open lifubang opened 3 months ago

lifubang commented 3 months ago
root@acmcoder:/home/acmcoder/ubuntutest# runc --systemd-cgroup run -d test
root@acmcoder:/home/acmcoder/ubuntutest# find /sys/fs/cgroup/ -name "*test*"
/sys/fs/cgroup/system.slice/runc-test.scope
root@acmcoder:/home/acmcoder/ubuntutest# cat /sys/fs/cgroup/system.slice/runc-test.scope/cpu.max
max 100000
root@acmcoder:/home/acmcoder/ubuntutest# runc --systemd-cgroup update --cpu-period 900000 test
root@acmcoder:/home/acmcoder/ubuntutest# cat /sys/fs/cgroup/system.slice/runc-test.scope/cpu.max
max 900000
root@acmcoder:/home/acmcoder/ubuntutest# runc --systemd-cgroup update --memory 500M test
root@acmcoder:/home/acmcoder/ubuntutest# cat /sys/fs/cgroup/system.slice/runc-test.scope/cpu.max
max 100000
root@acmcoder:/home/acmcoder/ubuntutest#
lifubang commented 3 months ago
root@acmcoder:/home/acmcoder/ubuntutest# uname -a
Linux acmcoder 5.15.0-100-generic #110~20.04.1-Ubuntu SMP Tue Feb 13 14:25:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
kolyshkin commented 3 months ago

AFAIK it does not make sense to set --cpu-period without --cpu-quota. And if you set cpu-quota, the bug is no longer reproducible.

To aid in debugging, I have used this patch

--- a/libcontainer/cgroups/systemd/v2.go
+++ b/libcontainer/cgroups/systemd/v2.go
@@ -278,6 +278,8 @@ func genV2ResourcesProperties(dirPath string, r *configs.Resources, cm *dbusConn
                properties = append(properties, unifiedProps...)
        }

+       logrus.Debugf("Properties: %+v", properties)
+
        return properties, nil
 }

Also, it makes sense to take a look at what systemd saves for the given unit:

$ tail /run/systemd/transient/runc-test.scope.d/*

==> /run/systemd/transient/runc-test.scope.d/50-CPUQuota.conf <==
# This is a drop-in unit file extension, created via "systemctl set-property"
# or an equivalent operation. Do not edit.
[Scope]
CPUQuota=100%

==> /run/systemd/transient/runc-test.scope.d/50-CPUQuotaPeriodSec.conf <==
# This is a drop-in unit file extension, created via "systemctl set-property"
# or an equivalent operation. Do not edit.
[Scope]
CPUQuotaPeriodSec=900ms
...
kolyshkin commented 3 months ago

Maybe the solution to this issue is to disallow setting --cpu-period without setting --cpu-quota as they are inter-dependent.

kolyshkin commented 3 months ago

Perhaps commit 32746fb33 is doing what it should do. Looking...

kolyshkin commented 3 months ago

The problem is, with the current code it's impossible to distinguish between "unset" and "set to 0", thus the following code:

https://github.com/opencontainers/runc/blob/18c313be729dd02b17934af41e32116a28b4b3bf/update.go#L294-L298

resets both quota and period.

Something like this commit from #4142 should fix this, although it needs to be reworked. Stay tuned.

kolyshkin commented 3 months ago

Still, even with the fixed code, there is no way to know if period or quota were ever been set before.

So, the best way to solve this is to require cpu-period and cpu-quota to be set together (and possibly break backward compatibility).

The way to solve this without breaking backward compatibility is to assume period (or quota) is not set if it's 0 in state.json.

I'll try to come up with a PR.

kolyshkin commented 3 months ago

Should be fixed in https://github.com/opencontainers/runc/pull/4227

kolyshkin commented 3 months ago

There's a similar bug for cpu-burst (it is reset to 0 after some unrelated runc update).