Open root-hardenedvault opened 1 year ago
We had (a different fork of) recent LKRG run on Ubuntu 22.04.1 with kernel 5.15.0-1019-aws #23-Ubuntu SMP Wed Aug 17 18:35:04 UTC 2022 aarch64 aarch64
in AWS instance type c6g.medium
for 4+ months with no such issue showing up. However, that instance has only 1 vCPU, so perhaps the issue is a race condition showing up on multi-[v]CPU systems.
However, parsing errors may have caused the hash of kernel.stext that needs to be updated but missed in p_arch_jump_label_transform_ret
What parsing error are you referring to? If something is incorrectly read, likely you see different memory layout than LKRG which may result in such type of the problems.
Additionally, LKRG synchronize with JUMP_LABEL using various locks which means it is impossible for integrity routine to not see the result of JUMP_LABEL work. It looks like you might hit some issue which is not root-cause and the patch is masking the real problem. Did you try to run under very verbose level to see what JUMP_LABEL really does?
the patch is masking the real problem.
Sure, which I assume is @root-hardenedvault's understanding too, which is why he calls this a "workaround" and doesn't send us a PR with these changes right away. Ideally, we'd figure out the real problem and arrive at a proper fix.
It appears that the issue is caused by a race condition. LKRG does not require any lock to be held when accessing p_db.p_jump_label.state. The panic consistently occurs during the process of updating the core text hash in arch_jump_label_transform_ret. We have also observed that p_db.p_jump_label.state is set to 1 (P_JUMP_LABEL_CORE_TEXT) when the integrity_timer calculates and compares the core text hash. It's likely that LKRG may update the core text hash while checking if it has been changed, which could lead to the race condition. Is there a mechanism in LKRG to avoid this situation? However, this cannot explain why the above patch works, since those updates would not be executed. Another scenario can trigger the panic (the similar kernel logs) is when the nftables work as a systemd service at boot time.
Function arch_jump_label_transform
is called under JUMP_LABEL lock. When LKRG intercept the call, it is also running under JUMP_LABEL lock and we do synchronize against it. Integrity verification routine won't run before acquiring this lock:
https://github.com/lkrg-org/lkrg/blob/main/src/modules/database/p_database.h#L192
If LKRG has this lock acquired, JUMP_LABEL engine won't modify .text section. I don't think it's a correct root-cause.
I'm also seeing this issue, also on a Raspberry Pi 4. It occurs consistently, a few seconds after my system makes it to the login prompt.
@root-hardenedvault @accelbread We think we've just fixed this issue with #294 here - can you please test and let us know? Thank you!
I'll give it a test over the weekend, thanks!
Unfortunately, this does not fix the issue for me :(
@accelbread can you provide some details about the problem? What is the kernel version, How easy is to repro it? Can you recompile the LKRG with P_LKRG_JUMP_LABEL_STEXT_DEBUG
, enable log_level=3
and show the logs?
btw. I heavily tested Ubuntu 23.10
under the kernel 6.5.0-1005-raspi
and the issue is not there. If you have an opportunity to check the same OS/kernel it would be helpful
I am on 6.1.57-hardened1
on NixOS. I have LKRG built into the kernel.
It is easy to reproduce. If I have default settings, a few seconds after boot, the device restarts. If I boot with "lkrg.kint_validate=1", the device does not restart a few seconds after boot, and runs fine.
I can recompile and retest later with debug and logs, and get back. Seems 6.5.9 kernel is available too now so will upgrade first.
I could also produce a minimal reproducing sd-card image if you'd like.
Reproduction steps:
Hardware: Rasperry Pi 4 OS: Raspberry Pi OS Kernel: 6.6.28+rpt-rpi-v8
[ 76.633929] LKRG: ALERT: DETECT: Kernel: _stext hash changed unexpectedly [ 76.646008] LKRG: ALERT: DETECT: Kernel: Module hash changed unexpectedly, name ipv6 [ 76.653862] LKRG: ALERT: DETECT: Kernel: Module list hash changed unexpectedly [ 76.661198] LKRG: ALERT: DETECT: Kernel: Module KOBJ list hash changed unexpectedly [ 76.668959] LKRG: ALERT: DETECT: Kernel: Module KOBJ hash changed unexpectedly, name ipv6 [ 76.677271] LKRG: ALERT: DETECT: Kernel: 5 checksums changed unexpectedly [ 76.684152] LKRG: ALERT: BLOCK: Kernel: 5 checksums changed unexpectedly [ 76.690944] Kernel panic - not syncing: Kernel: 5 checksums changed unexpectedly [ 76.698442] CPU: 2 PID: 38 Comm: kworker/u12:0 Tainted: G C O 6.6.28+rpt-rpi-v8 #1 Debian 1:6.6.28-1+rpt1 [ 76.709469] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT) [ 76.715380] Workqueue: events_unbound p_check_integrity [lkrg] [ 76.721325] Call trace: [ 76.723798] dump_backtrace+0xa0/0x100 [ 76.727600] show_stack+0x20/0x38 [ 76.730956] dump_stack_lvl+0x48/0x60 [ 76.734667] dump_stack+0x18/0x28 [ 76.738024] panic+0x330/0x398 [ 76.741118] p_check_integrity+0x1068/0x1900 [lkrg] [ 76.746082] process_one_work+0x148/0x3b8 [ 76.750145] worker_thread+0x32c/0x450 [ 76.753942] kthread+0x11c/0x128 [ 76.757213] ret_from_fork+0x10/0x20 [ 76.760834] SMP: stopping secondary CPUs [ 76.764813] Kernel Offset: 0x2c2c200000 from 0xffffffc080000000 [ 76.770811] PHYS_OFFSET: 0x0 [ 76.773724] CPU features: 0x0,80000201,3c020000,0000421b [ 76.779105] Memory Limit: none [ 76.782199] ---[ end Kernel panic - not syncing: Kernel: 5 checksums changed unexpectedly ]---
Reproduction steps:
Hardware: Rasperry Pi 4 OS: Ubuntu 22.04
We noticed that in p_arch_jump_label_transform_entry, the segment containing the destination address is parsed and the corresponding segment is updated in p_arch_jump_label_transform_ret based on the parsing result. However, parsing errors may have caused the hash of kernel.stext that needs to be updated but missed in p_arch_jump_label_transform_ret. The workaround is that adding an update operations in two places and it seem worked!