lkrg-org / lkrg

Linux Kernel Runtime Guard
https://lkrg.org
Other
402 stars 72 forks source link

kernel.stext integrity check failed due to an error in handling the p_arch_jump_label_transform_entry address on ARM64 #269

Open root-hardenedvault opened 1 year ago

root-hardenedvault commented 1 year ago

Reproduction steps:

  1. Build LKRG & load the LKM
  2. Reboot and then the kernel will panic:

Hardware: Rasperry Pi 4 OS: Ubuntu 22.04

[  302.646797] VED: ALERT: DETECT: Kernel: _stext hash changed unexpectedly
[  302.660354] VED: ALERT: DETECT: Kernel: 1 checksums changed unexpectedly
[  302.667154] VED: ALERT: BLOCK: Kernel: 1 checksums changed unexpectedly
[  302.673863] Kernel panic - not syncing: Kernel: 1 checksums changed unexpectedly
[  302.681366] CPU: 2 PID: 484 Comm: kworker/u8:6 Tainted: G         C OE     5.15.0-1027-raspi #29-Ubuntu
[  302.690899] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[  302.696816] Workqueue: events_unbound p_check_integrity [ved]
[  302.702698] Call trace:
[  302.705173]  dump_backtrace+0x0/0x200
[  302.708890]  show_stack+0x20/0x30
[  302.712250]  dump_stack_lvl+0x8c/0xb8
[  302.715965]  dump_stack+0x18/0x34
[  302.719324]  panic+0x1e4/0x3e4
[  302.722420]  p_check_integrity+0x1370/0x18d4 [ved]
[  302.727316]  process_one_work+0x204/0x4e0
[  302.731385]  worker_thread+0x144/0x490
[  302.735187]  kthread+0x128/0x134
[  302.738457]  ret_from_fork+0x10/0x20
[  302.742086] SMP: stopping secondary CPUs
[  302.746067] Kernel Offset: 0x5a489ca00000 from 0xffff800008000000
[  302.752245] PHYS_OFFSET: 0xffffadd980000000
[  302.756482] CPU features: 0x800804f1,00000846
[  302.760899] Memory Limit: none
[  302.763996] ---[ end Kernel panic - not syncing: Kernel: 1 checksums changed unexpectedly ]---

We noticed that in p_arch_jump_label_transform_entry, the segment containing the destination address is parsed and the corresponding segment is updated in p_arch_jump_label_transform_ret based on the parsing result. However, parsing errors may have caused the hash of kernel.stext that needs to be updated but missed in p_arch_jump_label_transform_ret. The workaround is that adding an update operations in two places and it seem worked!

diff --git a/src/modules/database/JUMP_LABEL/p_arch_jump_label_transform/p_arch_jump_label_transform.c b/src/modules/database/JUMP_LABEL/p_arch_jump_label_transform/p_arch_jump_label_transform.c
index 12d0ac9..0e75432 100644
--- a/src/modules/database/JUMP_LABEL/p_arch_jump_label_transform/p_arch_jump_label_transform.c
+++ b/src/modules/database/JUMP_LABEL/p_arch_jump_label_transform/p_arch_jump_label_transform.c
@@ -121,10 +121,10 @@ notrace int p_arch_jump_label_transform_ret(struct kretprobe_instance *ri, struc
          break;

       case P_JUMP_LABEL_MODULE_TEXT:
-
+#if defined(CONFIG_ARM64)
+         p_db.kernel_stext.p_hash = p_lkrg_fast_hash((unsigned char *)p_db.kernel_stext.p_addr,
+                                                (unsigned int)p_db.kernel_stext.p_size);
-
+#endif
          for (p_tmp = 0; p_tmp < p_db.p_module_list_nr; p_tmp++) {
             if (p_db.p_module_list_array[p_tmp].p_mod == p_db.p_jump_label.p_mod) {
                /*
@@ -186,8 +186,10 @@ notrace int p_arch_jump_label_transform_ret(struct kretprobe_instance *ri, struc
           * FTRACE might generate dynamic trampoline which is not part of .text section.
           * This is not abnormal situation anymore.
           */
+#if defined(CONFIG_ARM64)
+         p_db.kernel_stext.p_hash = p_lkrg_fast_hash((unsigned char *)p_db.kernel_stext.p_addr,
+                                                (unsigned int)p_db.kernel_stext.p_size);
+#endif
          break;
    }
solardiz commented 1 year ago

We had (a different fork of) recent LKRG run on Ubuntu 22.04.1 with kernel 5.15.0-1019-aws #23-Ubuntu SMP Wed Aug 17 18:35:04 UTC 2022 aarch64 aarch64 in AWS instance type c6g.medium for 4+ months with no such issue showing up. However, that instance has only 1 vCPU, so perhaps the issue is a race condition showing up on multi-[v]CPU systems.

Adam-pi3 commented 1 year ago

However, parsing errors may have caused the hash of kernel.stext that needs to be updated but missed in p_arch_jump_label_transform_ret

What parsing error are you referring to? If something is incorrectly read, likely you see different memory layout than LKRG which may result in such type of the problems.

Additionally, LKRG synchronize with JUMP_LABEL using various locks which means it is impossible for integrity routine to not see the result of JUMP_LABEL work. It looks like you might hit some issue which is not root-cause and the patch is masking the real problem. Did you try to run under very verbose level to see what JUMP_LABEL really does?

solardiz commented 1 year ago

the patch is masking the real problem.

Sure, which I assume is @root-hardenedvault's understanding too, which is why he calls this a "workaround" and doesn't send us a PR with these changes right away. Ideally, we'd figure out the real problem and arrive at a proper fix.

root-hardenedvault commented 1 year ago

It appears that the issue is caused by a race condition. LKRG does not require any lock to be held when accessing p_db.p_jump_label.state. The panic consistently occurs during the process of updating the core text hash in arch_jump_label_transform_ret. We have also observed that p_db.p_jump_label.state is set to 1 (P_JUMP_LABEL_CORE_TEXT) when the integrity_timer calculates and compares the core text hash. It's likely that LKRG may update the core text hash while checking if it has been changed, which could lead to the race condition. Is there a mechanism in LKRG to avoid this situation? However, this cannot explain why the above patch works, since those updates would not be executed. Another scenario can trigger the panic (the similar kernel logs) is when the nftables work as a systemd service at boot time.

Adam-pi3 commented 1 year ago

Function arch_jump_label_transform is called under JUMP_LABEL lock. When LKRG intercept the call, it is also running under JUMP_LABEL lock and we do synchronize against it. Integrity verification routine won't run before acquiring this lock: https://github.com/lkrg-org/lkrg/blob/main/src/modules/database/p_database.h#L192

If LKRG has this lock acquired, JUMP_LABEL engine won't modify .text section. I don't think it's a correct root-cause.

accelbread commented 9 months ago

I'm also seeing this issue, also on a Raspberry Pi 4. It occurs consistently, a few seconds after my system makes it to the login prompt.

solardiz commented 8 months ago

@root-hardenedvault @accelbread We think we've just fixed this issue with #294 here - can you please test and let us know? Thank you!

accelbread commented 8 months ago

I'll give it a test over the weekend, thanks!

accelbread commented 8 months ago

Unfortunately, this does not fix the issue for me :(

Adam-pi3 commented 8 months ago

@accelbread can you provide some details about the problem? What is the kernel version, How easy is to repro it? Can you recompile the LKRG with P_LKRG_JUMP_LABEL_STEXT_DEBUG, enable log_level=3 and show the logs?

btw. I heavily tested Ubuntu 23.10 under the kernel 6.5.0-1005-raspi and the issue is not there. If you have an opportunity to check the same OS/kernel it would be helpful

accelbread commented 8 months ago

I am on 6.1.57-hardened1 on NixOS. I have LKRG built into the kernel.

It is easy to reproduce. If I have default settings, a few seconds after boot, the device restarts. If I boot with "lkrg.kint_validate=1", the device does not restart a few seconds after boot, and runs fine.

I can recompile and retest later with debug and logs, and get back. Seems 6.5.9 kernel is available too now so will upgrade first.

I could also produce a minimal reproducing sd-card image if you'd like.

citypw commented 1 month ago

Reproduction steps:

Hardware: Rasperry Pi 4 OS: Raspberry Pi OS Kernel: 6.6.28+rpt-rpi-v8

[ 76.633929] LKRG: ALERT: DETECT: Kernel: _stext hash changed unexpectedly [ 76.646008] LKRG: ALERT: DETECT: Kernel: Module hash changed unexpectedly, name ipv6 [ 76.653862] LKRG: ALERT: DETECT: Kernel: Module list hash changed unexpectedly [ 76.661198] LKRG: ALERT: DETECT: Kernel: Module KOBJ list hash changed unexpectedly [ 76.668959] LKRG: ALERT: DETECT: Kernel: Module KOBJ hash changed unexpectedly, name ipv6 [ 76.677271] LKRG: ALERT: DETECT: Kernel: 5 checksums changed unexpectedly [ 76.684152] LKRG: ALERT: BLOCK: Kernel: 5 checksums changed unexpectedly [ 76.690944] Kernel panic - not syncing: Kernel: 5 checksums changed unexpectedly [ 76.698442] CPU: 2 PID: 38 Comm: kworker/u12:0 Tainted: G C O 6.6.28+rpt-rpi-v8 #1 Debian 1:6.6.28-1+rpt1 [ 76.709469] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT) [ 76.715380] Workqueue: events_unbound p_check_integrity [lkrg] [ 76.721325] Call trace: [ 76.723798] dump_backtrace+0xa0/0x100 [ 76.727600] show_stack+0x20/0x38 [ 76.730956] dump_stack_lvl+0x48/0x60 [ 76.734667] dump_stack+0x18/0x28 [ 76.738024] panic+0x330/0x398 [ 76.741118] p_check_integrity+0x1068/0x1900 [lkrg] [ 76.746082] process_one_work+0x148/0x3b8 [ 76.750145] worker_thread+0x32c/0x450 [ 76.753942] kthread+0x11c/0x128 [ 76.757213] ret_from_fork+0x10/0x20 [ 76.760834] SMP: stopping secondary CPUs [ 76.764813] Kernel Offset: 0x2c2c200000 from 0xffffffc080000000 [ 76.770811] PHYS_OFFSET: 0x0 [ 76.773724] CPU features: 0x0,80000201,3c020000,0000421b [ 76.779105] Memory Limit: none [ 76.782199] ---[ end Kernel panic - not syncing: Kernel: 5 checksums changed unexpectedly ]---