Repeated lkrg.trigger=1 makes system extremely unresponsive

solardiz commented 1 year ago

On the same system as in #256, running as root one instance of:

while :; do sysctl lkrg.trigger=1; done

made the system extremely unresponsive, with many concurrent kernel threads, as seen in top output below. As I recall from past discussions with @Adam-pi3, this was supposed to result in just one instance of our integrity checking routine running at a time, so eating up just one logical CPU, unlike what we see here.

top - 18:38:07 up 46 min,  1 user,  load average: 455.96, 181.60, 67.30
Tasks: 751 total,   8 running, 743 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us, 30.9 sy,  0.0 ni, 68.9 id,  0.0 wa,  0.1 hi,  0.0 si,  0.0 st
MiB Mem :  31990.0 total,  30378.6 free,   1227.3 used,   1003.1 buff/cache
MiB Swap:   1950.0 total,   1950.0 free,      0.0 used.  30762.7 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  55465 root      20   0       0      0      0 D  10.1   0.0   0:02.44 kworker/u32:125+events_unbound
  55912 root      20   0       0      0      0 D   6.0   0.0   0:02.13 kworker/u32:322+events_unbound
  55206 root      20   0       0      0      0 D   5.8   0.0   0:01.76 kworker/u32:11+events_unbound
  55669 root      20   0       0      0      0 D   5.8   0.0   0:01.97 kworker/u32:215+events_unbound
  56055 root      20   0       0      0      0 D   5.5   0.0   0:01.65 kworker/u32:376+events_unbound
  55744 root      20   0       0      0      0 D   5.0   0.0   0:01.80 kworker/u32:247+events_unbound
  55691 root      20   0       0      0      0 D   4.6   0.0   0:02.28 kworker/u32:224+events_unbound
  55702 root      20   0       0      0      0 D   4.1   0.0   0:01.71 kworker/u32:486+events_unbound
     17 root      20   0       0      0      0 I   3.2   0.0   0:04.36 kworker/0:1-events
  55345 root      20   0       0      0      0 D   3.1   0.0   0:01.66 kworker/u32:74+events_unbound
    127 root      20   0       0      0      0 I   3.0   0.0   0:03.37 kworker/1:1-events
  55578 root      20   0       0      0      0 D   3.0   0.0   0:01.32 kworker/u32:171+events_unbound
  56090 root      20   0       0      0      0 D   2.9   0.0   0:01.73 kworker/u32:392+events_unbound
  55293 root      20   0       0      0      0 D   2.9   0.0   0:01.55 kworker/u32:50+events_unbound
  56211 root      20   0       0      0      0 D   2.9   0.0   0:01.15 kworker/u32:445+events_unbound
    159 root      20   0       0      0      0 I   2.9   0.0   0:02.90 kworker/2:1-events
  55617 root      20   0       0      0      0 D   2.8   0.0   0:01.25 kworker/u32:192+events_unbound
  55645 root      20   0       0      0      0 D   2.8   0.0   0:01.22 kworker/u32:205+events_unbound
  56302 root      20   0       0      0      0 D   2.8   0.0   0:01.69 kworker/u32:485+events_unbound
  55525 root      20   0       0      0      0 D   2.8   0.0   0:01.19 kworker/u32:151+events_unbound
  55361 root      20   0       0      0      0 D   2.7   0.0   0:01.30 kworker/u32:80+events_unbound
  55309 root      20   0       0      0      0 D   2.7   0.0   0:01.37 kworker/u32:57+events_unbound
  56268 root      20   0       0      0      0 D   2.7   0.0   0:01.25 kworker/u32:470+events_unbound
  55251 root      20   0       0      0      0 D   2.7   0.0   0:01.79 kworker/u32:31+events_unbound
  55719 root      20   0       0      0      0 D   2.7   0.0   0:02.50 kworker/u32:236+events_unbound
  55455 root      20   0       0      0      0 D   2.6   0.0   0:01.60 kworker/u32:120+events_unbound
  55225 root      20   0       0      0      0 D   2.6   0.0   0:01.66 kworker/u32:19+events_unbound
  55697 root      20   0       0      0      0 D   2.6   0.0   0:02.34 kworker/u32:227+events_unbound
  55762 root      20   0       0      0      0 D   2.6   0.0   0:01.26 kworker/u32:256+events_unbound
  56244 root      20   0       0      0      0 D   2.6   0.0   0:01.58 kworker/u32:459+events_unbound
  55219 root      20   0       0      0      0 D   2.5   0.0   0:01.35 kworker/u32:17+events_unbound
  55335 root      20   0       0      0      0 D   2.5   0.0   0:02.29 kworker/u32:68+events_unbound
  55486 root      20   0       0      0      0 D   2.5   0.0   0:01.49 kworker/u32:134+events_unbound
  55580 root      20   0       0      0      0 D   2.5   0.0   0:01.82 kworker/u32:173+events_unbound
  55914 root      20   0       0      0      0 D   2.5   0.0   0:01.56 kworker/u32:323+events_unbound
  56188 root      20   0       0      0      0 D   2.5   0.0   0:01.90 kworker/u32:432+events_unbound
  55889 root      20   0       0      0      0 D   2.4   0.0   0:01.08 kworker/u32:312+events_unbound
  57841 root      20   0       0      0      0 I   2.4   0.0   0:01.57 kworker/5:0-events
  55663 root      20   0       0      0      0 D   2.4   0.0   0:01.18 kworker/u32:212+events_unbound
  55694 root      20   0       0      0      0 D   2.4   0.0   0:01.59 kworker/u32:226+events_unbound
    450 root      20   0       0      0      0 I   2.3   0.0   0:02.47 kworker/4:2-events
  55320 root      20   0       0      0      0 D   2.3   0.0   0:01.09 kworker/u32:62+events_unbound
  55533 root      20   0       0      0      0 D   2.3   0.0   0:01.20 kworker/u32:155+events_unbound
  55587 root      20   0       0      0      0 D   2.3   0.0   0:01.36 kworker/u32:178+events_unbound
  55944 root      20   0       0      0      0 D   2.3   0.0   0:01.78 kworker/u32:335+events_unbound
  55628 root      20   0       0      0      0 D   2.2   0.0   0:01.08 kworker/u32:197+events_unbound
  55321 root      20   0       0      0      0 D   2.2   0.0   0:01.06 kworker/u32:63+events_unbound
  56010 root      20   0       0      0      0 D   2.2   0.0   0:01.82 kworker/u32:357+events_unbound
    164 root      20   0       0      0      0 R   2.2   0.0   0:02.62 kworker/3:1-events
  55492 root      20   0       0      0      0 D   2.2   0.0   0:00.88 kworker/u32:137+events_unbound
  55514 root      20   0       0      0      0 D   2.2   0.0   0:01.79 kworker/u32:146+events_unbound
  55755 root      20   0       0      0      0 D   2.2   0.0   0:01.62 kworker/u32:254+events_unbound
  55833 root      20   0       0      0      0 D   2.2   0.0   0:02.06 kworker/u32:287+events_unbound
  55983 root      20   0       0      0      0 D   2.2   0.0   0:01.09 kworker/u32:345+events_unbound
  56031 root      20   0       0      0      0 D   2.2   0.0   0:01.18 kworker/u32:366+events_unbound
  56226 root      20   0       0      0      0 D   2.2   0.0   0:01.42 kworker/u32:452+events_unbound
  56266 root      20   0       0      0      0 D   2.2   0.0   0:01.86 kworker/u32:469+events_unbound
  56300 root      20   0       0      0      0 D   2.2   0.0   0:01.35 kworker/u32:484+events_unbound
  55551 root      20   0       0      0      0 D   2.1   0.0   0:01.18 kworker/u32:162+events_unbound
  55997 root      20   0       0      0      0 D   2.1   0.0   0:01.75 kworker/u32:351+events_unbound
  35173 root      20   0       0      0      0 D   2.1   0.0   0:03.13 kworker/u32:2+events_unbound
  55862 root      20   0       0      0      0 D   2.0   0.0   0:01.14 kworker/u32:300+events_unbound
  55684 root      20   0       0      0      0 D   2.0   0.0   0:01.25 kworker/u32:221+events_unbound
    674 root      20   0       0      0      0 I   2.0   0.0   0:02.16 kworker/7:2-events
  55750 root      20   0       0      0      0 D   2.0   0.0   0:01.06 kworker/u32:251+events_unbound
  55842 root      20   0       0      0      0 D   2.0   0.0   0:00.91 kworker/u32:290+events_unbound
  55531 root      20   0       0      0      0 D   1.9   0.0   0:01.17 kworker/u32:154+events_unbound
  55612 root      20   0       0      0      0 D   1.9   0.0   0:00.90 kworker/u32:190+events_unbound
  55830 root      20   0       0      0      0 D   1.9   0.0   0:01.06 kworker/u32:285+events_unbound
  55844 root      20   0       0      0      0 D   1.9   0.0   0:01.64 kworker/u32:292+events_unbound
  55916 root      20   0       0      0      0 D   1.9   0.0   0:01.11 kworker/u32:324+events_unbound
  55928 root      20   0       0      0      0 D   1.9   0.0   0:00.98 kworker/u32:328+events_unbound
  56002 root      20   0       0      0      0 D   1.9   0.0   0:01.50 kworker/u32:353+events_unbound
  56053 root      20   0       0      0      0 D   1.9   0.0   0:01.00 kworker/u32:375+events_unbound
  55245 root      20   0       0      0      0 D   1.9   0.0   0:01.89 kworker/u32:28+events_unbound
  55307 root      20   0       0      0      0 D   1.9   0.0   0:01.10 kworker/u32:56+events_unbound
  55422 root      20   0       0      0      0 D   1.9   0.0   0:01.09 kworker/u32:106+events_unbound
  56104 root      20   0       0      0      0 D   1.8   0.0   0:01.19 kworker/u32:398+events_unbound
  55665 root      20   0       0      0      0 D   1.8   0.0   0:01.71 kworker/u32:213+events_unbound
  37357 root      20   0       0      0      0 I   1.8   0.0   0:01.60 kworker/15:2-events
  55325 root      20   0       0      0      0 D   1.8   0.0   0:02.26 kworker/u32:65+events_unbound
  55773 root      20   0       0      0      0 D   1.8   0.0   0:01.41 kworker/u32:260+events_unbound
  56274 root      20   0       0      0      0 D   1.8   0.0   0:01.11 kworker/u32:472+events_unbound
  30357 root      20   0       0      0      0 I   1.7   0.0   0:02.20 kworker/6:5-mm_percpu_wq
  55379 root      20   0       0      0      0 D   1.7   0.0   0:01.49 kworker/u32:87+events_unbound
  55876 root      20   0       0      0      0 D   1.7   0.0   0:01.22 kworker/u32:306+events_unbound
  55926 root      20   0       0      0      0 D   1.7   0.0   0:00.86 kworker/u32:326+events_unbound
  55942 root      20   0       0      0      0 D   1.7   0.0   0:01.00 kworker/u32:334+events_unbound

solardiz commented 1 year ago

Somehow repeating a lighter version of the above command:

for n in `seq 0 999`; do sysctl lkrg.trigger=1; done

after LKRG reload (rmmod/insmod), did not result in high load like that, with the number of running kernel threads staying at just a few for the minute that this command ran. However, the issue reported in #256 was also not seen after the module reload yet, so perhaps the high concurrency from trigger is related to the integrity check routine producing log messages.

solardiz commented 1 year ago

3 concurrent instances of the trigger loop, without ALERTs, gave this:

top - 19:36:37 up  1:44,  2 users,  load average: 309.46, 100.91, 44.82
Tasks: 763 total,   9 running, 754 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us, 30.6 sy,  0.0 ni, 69.2 id,  0.0 wa,  0.1 hi,  0.0 si,  0.0 st

and indeed the system feels very unresponsive - it takes tens of seconds to run a command.

Adam-pi3 commented 1 year ago

That's normally not the case. Which kernel is it? It is likely that some patches were backported to the distro kernel where you test it (related to #256 ) or something new was added to the latest kernel. It looks like our integrity routine is waiting for some lock(s). Also #256 looks like some JUMP_LABEL-like related features where backported which we don't handle right.

solardiz commented 1 year ago

The kernel is 5.14.0-162.6.1.el9_1.0.1.x86_64. Yes, "#256 looks like some JUMP_LABEL-like related features were backported which we don't handle right", however that only helped expose that frequent triggering of the integrity checks can cause a concurrency like this. As I said in my previous comment here, I was also able to trigger this issue (257) without any ALERTs (so separately from 256) - in fact, after a few minutes the system became so unresponsive in SSH that I had to reboot it via Equinix's console.

I see nothing in our code that would protect from reaching such concurrency, and indeed we take locks. I am not really worried about the reproducer with repeated lkrg.trigger=1, but I worry about the same issue probably also occurring if our "random events" just happen to be frequent enough on a certain system under certain reasonable usage. Just by using low probabilities we don't guarantee the frequency is low enough in absolute terms. So we probably need at least one of these changes:

Guard against concurrent invocations of the integrity check - e.g., use a trylock primitive (and skip the whole thing if already locked) or/and don't queue the work if an instance is currently running (maybe that's not enough - can still have more than one instance queued, and then they can potentially run simultaneously?)
Rate-limit the random events (don't rely solely on probabilities).
Drop the random events.

Adam-pi3 commented 1 year ago

I'm doing similar test very often and I don't have the problem which you describe. However, I do hit it when we have some unsupported modification of the kernel. Here is the test from Ubuntu LTS:

root@pi3-ubuntu:~/lkrg# dmesg|tail
[  149.393714] lkrg: module verification failed: signature and/or required key missing - tainting kernel
[  149.403454] LKRG: ALIVE: Loading LKRG
[  149.467470] Freezing user space processes ... (elapsed 0.001 seconds) done.
[  149.468749] OOM killer disabled.
[  149.664343] LKRG: ISSUE: [kretprobe] register_kretprobe() for <ovl_dentry_is_whiteout> failed! [err=-22]
[  149.687548] LKRG: ISSUE: [kretprobe] register_kretprobe() for ovl_dentry_is_whiteout failed and ISRA / CONSTPROP version not found!
[  149.687549] LKRG: ISSUE: Can't hook 'ovl_dentry_is_whiteout'. This is expected when OverlayFS is not used.
[  150.887098] LKRG: ALIVE: LKRG initialized successfully
[  150.887099] OOM killer enabled.
[  150.887100] Restarting tasks ... done.
root@pi3-ubuntu:~/lkrg# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.5 LTS
Release:        18.04
Codename:       bionic
root@pi3-ubuntu:~/lkrg# uname -a
Linux pi3-ubuntu 4.15.0-194-generic #205-Ubuntu SMP Fri Sep 16 19:49:27 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

with much more aggressive approach by running for a minute:

while true; do sysctl lkrg.trigger=1; done

Results:

root@pi3-ubuntu:~/lkrg# ps aux |grep kwork
root          4  0.0  0.0      0     0 ?        I<   09:41   0:00 [kworker/0:0H]
root          6  0.0  0.0      0     0 ?        I<   09:41   0:00 [kworker/0:1H]
root         33  0.0  0.0      0     0 ?        I    09:41   0:00 [kworker/0:1]
root         36  0.0  0.0      0     0 ?        I<   09:41   0:00 [kworker/u257:0]
root        133  0.0  0.0      0     0 ?        I    09:41   0:00 [kworker/0:2]
root        281  0.5  0.0      0     0 ?        I    09:41   0:02 [kworker/u256:15]
root        282  0.7  0.0      0     0 ?        I    09:41   0:03 [kworker/u256:16]
root        283  1.0  0.0      0     0 ?        I    09:41   0:04 [kworker/u256:17]
root        284  0.9  0.0      0     0 ?        I    09:41   0:03 [kworker/u256:18]
root        285  1.1  0.0      0     0 ?        I    09:41   0:04 [kworker/u256:19]
root        286  1.1  0.0      0     0 ?        I    09:41   0:05 [kworker/u256:20]
root        287  0.8  0.0      0     0 ?        I    09:41   0:03 [kworker/u256:21]
root        288  0.4  0.0      0     0 ?        I    09:41   0:01 [kworker/u256:22]
root        289  1.0  0.0      0     0 ?        I    09:41   0:04 [kworker/u256:23]
root        290  1.2  0.0      0     0 ?        I    09:41   0:05 [kworker/u256:24]
root        291  0.9  0.0      0     0 ?        I    09:41   0:04 [kworker/u256:25]
root        292  1.1  0.0      0     0 ?        I    09:41   0:04 [kworker/u256:26]
root        293  1.1  0.0      0     0 ?        I    09:41   0:04 [kworker/u256:27]
root        294  0.2  0.0      0     0 ?        I    09:41   0:01 [kworker/u256:28]
root        295  0.6  0.0      0     0 ?        I    09:41   0:02 [kworker/u256:29]
root        296  0.6  0.0      0     0 ?        I    09:41   0:02 [kworker/u256:30]
root        299  1.2  0.0      0     0 ?        I    09:41   0:05 [kworker/u256:31]
root        555  0.0  0.0      0     0 ?        I<   09:41   0:00 [kworker/u257:1]
root      46266  0.0  0.0  13140  1160 pts/0    S+   09:48   0:00 grep --color=auto kwork

The system runs fine and is very responsive

solardiz commented 1 year ago

by running for a minute

In my testing, one minute was not enough to fully trigger the problem - a few minutes were needed.

Anyway, from what you show the number of kthreads appeared to be limited to the number of vCPUs (do you have 16 maybe?) - however, in my testing with the newer kernel there appeared to be no such limit. So you might want to retest with a newer kernel, too.

Adam-pi3 commented 1 year ago

In my testing, one minute was not enough to fully trigger the problem - a few minutes were needed.

I left it running for more than 10 minutes on the following system:

2 vCPUs
Ubuntu 22.04 LTS
kernel: 5.15.0-53-generic

OS runs fine, responsive and don't have that issue.

Regarding my previous tests:

Do you have 16 maybe?

No it was 1 vCPU VM

lkrg-org / lkrg

Repeated lkrg.trigger=1 makes system extremely unresponsive #257