Open solardiz opened 1 year ago
Somehow repeating a lighter version of the above command:
for n in `seq 0 999`; do sysctl lkrg.trigger=1; done
after LKRG reload (rmmod/insmod), did not result in high load like that, with the number of running kernel threads staying at just a few for the minute that this command ran. However, the issue reported in #256 was also not seen after the module reload yet, so perhaps the high concurrency from trigger is related to the integrity check routine producing log messages.
3 concurrent instances of the trigger loop, without ALERTs, gave this:
top - 19:36:37 up 1:44, 2 users, load average: 309.46, 100.91, 44.82
Tasks: 763 total, 9 running, 754 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 30.6 sy, 0.0 ni, 69.2 id, 0.0 wa, 0.1 hi, 0.0 si, 0.0 st
and indeed the system feels very unresponsive - it takes tens of seconds to run a command.
That's normally not the case. Which kernel is it? It is likely that some patches were backported to the distro kernel where you test it (related to #256 ) or something new was added to the latest kernel. It looks like our integrity routine is waiting for some lock(s). Also #256 looks like some JUMP_LABEL-like related features where backported which we don't handle right.
The kernel is 5.14.0-162.6.1.el9_1.0.1.x86_64. Yes, "#256 looks like some JUMP_LABEL-like related features were backported which we don't handle right", however that only helped expose that frequent triggering of the integrity checks can cause a concurrency like this. As I said in my previous comment here, I was also able to trigger this issue (257) without any ALERTs (so separately from 256) - in fact, after a few minutes the system became so unresponsive in SSH that I had to reboot it via Equinix's console.
I see nothing in our code that would protect from reaching such concurrency, and indeed we take locks. I am not really worried about the reproducer with repeated lkrg.trigger=1
, but I worry about the same issue probably also occurring if our "random events" just happen to be frequent enough on a certain system under certain reasonable usage. Just by using low probabilities we don't guarantee the frequency is low enough in absolute terms. So we probably need at least one of these changes:
trylock
primitive (and skip the whole thing if already locked) or/and don't queue the work if an instance is currently running (maybe that's not enough - can still have more than one instance queued, and then they can potentially run simultaneously?)I'm doing similar test very often and I don't have the problem which you describe. However, I do hit it when we have some unsupported modification of the kernel. Here is the test from Ubuntu LTS:
root@pi3-ubuntu:~/lkrg# dmesg|tail
[ 149.393714] lkrg: module verification failed: signature and/or required key missing - tainting kernel
[ 149.403454] LKRG: ALIVE: Loading LKRG
[ 149.467470] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 149.468749] OOM killer disabled.
[ 149.664343] LKRG: ISSUE: [kretprobe] register_kretprobe() for <ovl_dentry_is_whiteout> failed! [err=-22]
[ 149.687548] LKRG: ISSUE: [kretprobe] register_kretprobe() for ovl_dentry_is_whiteout failed and ISRA / CONSTPROP version not found!
[ 149.687549] LKRG: ISSUE: Can't hook 'ovl_dentry_is_whiteout'. This is expected when OverlayFS is not used.
[ 150.887098] LKRG: ALIVE: LKRG initialized successfully
[ 150.887099] OOM killer enabled.
[ 150.887100] Restarting tasks ... done.
root@pi3-ubuntu:~/lkrg# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic
root@pi3-ubuntu:~/lkrg# uname -a
Linux pi3-ubuntu 4.15.0-194-generic #205-Ubuntu SMP Fri Sep 16 19:49:27 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
with much more aggressive approach by running for a minute:
while true; do sysctl lkrg.trigger=1; done
Results:
root@pi3-ubuntu:~/lkrg# ps aux |grep kwork
root 4 0.0 0.0 0 0 ? I< 09:41 0:00 [kworker/0:0H]
root 6 0.0 0.0 0 0 ? I< 09:41 0:00 [kworker/0:1H]
root 33 0.0 0.0 0 0 ? I 09:41 0:00 [kworker/0:1]
root 36 0.0 0.0 0 0 ? I< 09:41 0:00 [kworker/u257:0]
root 133 0.0 0.0 0 0 ? I 09:41 0:00 [kworker/0:2]
root 281 0.5 0.0 0 0 ? I 09:41 0:02 [kworker/u256:15]
root 282 0.7 0.0 0 0 ? I 09:41 0:03 [kworker/u256:16]
root 283 1.0 0.0 0 0 ? I 09:41 0:04 [kworker/u256:17]
root 284 0.9 0.0 0 0 ? I 09:41 0:03 [kworker/u256:18]
root 285 1.1 0.0 0 0 ? I 09:41 0:04 [kworker/u256:19]
root 286 1.1 0.0 0 0 ? I 09:41 0:05 [kworker/u256:20]
root 287 0.8 0.0 0 0 ? I 09:41 0:03 [kworker/u256:21]
root 288 0.4 0.0 0 0 ? I 09:41 0:01 [kworker/u256:22]
root 289 1.0 0.0 0 0 ? I 09:41 0:04 [kworker/u256:23]
root 290 1.2 0.0 0 0 ? I 09:41 0:05 [kworker/u256:24]
root 291 0.9 0.0 0 0 ? I 09:41 0:04 [kworker/u256:25]
root 292 1.1 0.0 0 0 ? I 09:41 0:04 [kworker/u256:26]
root 293 1.1 0.0 0 0 ? I 09:41 0:04 [kworker/u256:27]
root 294 0.2 0.0 0 0 ? I 09:41 0:01 [kworker/u256:28]
root 295 0.6 0.0 0 0 ? I 09:41 0:02 [kworker/u256:29]
root 296 0.6 0.0 0 0 ? I 09:41 0:02 [kworker/u256:30]
root 299 1.2 0.0 0 0 ? I 09:41 0:05 [kworker/u256:31]
root 555 0.0 0.0 0 0 ? I< 09:41 0:00 [kworker/u257:1]
root 46266 0.0 0.0 13140 1160 pts/0 S+ 09:48 0:00 grep --color=auto kwork
The system runs fine and is very responsive
by running for a minute
In my testing, one minute was not enough to fully trigger the problem - a few minutes were needed.
Anyway, from what you show the number of kthreads appeared to be limited to the number of vCPUs (do you have 16 maybe?) - however, in my testing with the newer kernel there appeared to be no such limit. So you might want to retest with a newer kernel, too.
In my testing, one minute was not enough to fully trigger the problem - a few minutes were needed.
I left it running for more than 10 minutes on the following system:
OS runs fine, responsive and don't have that issue.
Regarding my previous tests:
Do you have 16 maybe?
No it was 1 vCPU VM
On the same system as in #256, running as root one instance of:
made the system extremely unresponsive, with many concurrent kernel threads, as seen in
top
output below. As I recall from past discussions with @Adam-pi3, this was supposed to result in just one instance of our integrity checking routine running at a time, so eating up just one logical CPU, unlike what we see here.