lkrg-org / lkrg

Linux Kernel Runtime Guard
https://lkrg.org
Other
410 stars 72 forks source link

"Corrupted 'off' flag!" on a kernel built with clang LTO #152

Closed Polluktus closed 2 years ago

Polluktus commented 2 years ago

Fallowing @solardiz advice, i created new issue after discussion in #135, seems similar to #30 and #106. Kernel: Custom 5.16.2 LKRG: 0.9.2

My steps:

git clone --depth 1 https://github.com/lkrg-org/lkrg.git
cd lkrg
make CC=/usr/lib/llvm/13/bin/clang LD=/usr/bin/ld.lld -j8
doas /usr/src/linux/scripts/sign-file sha512 /usr/src/linux/certs/signing_key.pem /usr/src/linux/certs/signing_key.x509 output/p_lkrg.ko
doas insmod output/p_lkrg.ko kint_enforce=0 pint_enforce=0; sleep 1; doas rmmod p_lkrg (to not overflow dmesg buffer)

dmesg: https://pastebin.com/meTDESak

I know that troubleshooting custom kernel may by hard and almost impossible to reproduce, so if you decide to close this issue, i will understand.

solardiz commented 2 years ago

Let's keep everything archived on GitHub, rather than use pastebin, so here's a copy of dmesg taken from the pastebin above: dmesg.txt

Excerpts:

[    0.000000] Linux version 5.16.2-polluktus-lkrg-test (root@Warsaw) (clang version 13.0.0, LLD 13.0.0) #1 SMP PREEMPT Fri Jan 21 22:45:45 CET 2022
[...]
[  101.148106] [p_lkrg] Loading LKRG...
[  101.164499] Freezing user space processes ... (elapsed 0.001 seconds) done.
[  101.165709] OOM killer disabled.
[  104.888290] [p_lkrg] [kretprobe] register_kretprobe() for <ovl_create_or_link> failed! [err=-2]
[  104.888295] [p_lkrg] Can't hook 'ovl_create_or_link' function. This is expected if you are not using OverlayFS.
[  105.840787] [p_lkrg] LKRG initialized successfully!
[  105.840789] OOM killer enabled.
[  105.840790] Restarting tasks ... 
[  105.841107] [p_lkrg] <Exploit Detection> process[4236 | IndexedDB #6] has corrupted 'off' flag!
[...]
[  105.841162] [p_lkrg] <Exploit Detection> process[4236 | IndexedDB #6] has corrupted 'off' flag!
[  105.841207] done.
[  105.841376] [p_lkrg] <Exploit Detection> process[6694 | dwmblocks] has corrupted 'off' flag!
[  105.841384] [p_lkrg] <Exploit Detection> process[6694 | sh] has corrupted 'off' flag!
[...]
[  106.425733] [p_lkrg] <Exploit Detection> process[3897 | dbus-daemon] has corrupted 'off' flag!
[  106.427659] [p_lkrg] <Exploit Detection> process[6709 | playerctl] has corrupted 'off' flag!
[  106.429774] [p_lkrg] <Exploit Detection> process[6710 | awk] has corrupted 'off' flag!
[  106.429787] [p_lkrg] <Exploit Detection> process[6710 | awk] has corrupted 'off' flag!
[  106.429794] [p_lkrg] <Exploit Detection> process[6710 | awk] has corrupted 'off' flag!
[  106.861689] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[6715 | doas] has different 'cred' pointer
[  106.861699] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[6715 | doas] has different 'real_cred' pointer
[  106.861702] [p_lkrg] <Exploit Detection> process[6715 | doas] has different EUID! 1000 vs 0
[  106.861707] [p_lkrg] <Exploit Detection> process[6715 | doas] has different SUID! 1000 vs 0
[  106.861710] [p_lkrg] <Exploit Detection> process[6715 | doas] has different EUID! 1000 vs 0
[  106.861712] [p_lkrg] <Exploit Detection> process[6715 | doas] has different SUID! 1000 vs 0
[  106.861715] [p_lkrg] <Exploit Detection> process[6715 | doas] has different FSUID! 1000 vs 0
[  106.861729] [p_lkrg] <Exploit Detection> process[6715 | doas] has corrupted 'off' flag!

Quoting Adam's request in #106, which is also applicable here:

we have introduced a P_LKRG_TASK_OFF_DEBUG compilation option which helps to debug issues like that. Can you enable such option, recompile LKRG and re-run your tests invoking the described problem and share the logs? This option can be enabled in src/modules/print_log/p_lkrg_log_level_shared.h file (un-comment line 31)

My guess, though, is this time it's an effect of LTO combined with our attempted hooking of kernel-internal functions. Perhaps LTO made it so that our hooks are not in all the right places. Perhaps building the kernel without LTO would help.

solardiz commented 2 years ago

Also quoting @Polluktus in #135:

I've build kernel locally on gentoo with my own custom config, custom patches, some yank from 5.17 and clang lto O3 march=native.

solardiz commented 2 years ago

I labeled this "question" for now, although I feel this is also mid-way between "bug" and "portability". It isn't exactly a code bug that we hook kernel-internal functions, but it is indeed not ideal that we (have to) do that.

Polluktus commented 2 years ago

Yes, you were right, with CONFIG_LTO_NONE=y the problem is fixed

[   87.798343] p_lkrg: loading out-of-tree module taints kernel.
[   87.865313] [p_lkrg] Loading LKRG...
[   87.880516] Freezing user space processes ... (elapsed 0.001 seconds) done.
[   87.881736] OOM killer disabled.
[   91.209411] [p_lkrg] [kretprobe] register_kretprobe() for <ovl_create_or_link> failed! [err=-2]
[   91.209415] [p_lkrg] Can't hook 'ovl_create_or_link' function. This is expected if you are not using OverlayFS.
[   92.059147] [p_lkrg] LKRG initialized successfully!
[   92.059149] OOM killer enabled.
[   92.059149] Restarting tasks ... done.

Later i will paste output of LTO + P_LKRG_TASK_OFF_DEBUG

Polluktus commented 2 years ago

LTO + P_LKRG_TASK_OFF_DEBUG dmesg.txt

Adam-pi3 commented 2 years ago

Hm... Looks like some calls to override_creds functions has been inlined so the hook is never fired and we have FP. I'm not sure if we should spend more time on this at least for now. Looks like LTO will be problematic...

Adam-pi3 commented 2 years ago

Closing this issue for now... @solardiz any objections?

0xC0ncord commented 2 years ago

I think in the future this should be revisited at least. While I do not use LTO at the moment, I will likely end up building my kernels with Clang's CFI as well once it becomes available for x86_64. It would be nice to be able to continue using LKRG once that happens.

0xC0ncord commented 2 years ago

I think in the future this should be revisited at least. While I do not use LTO at the moment, I will likely end up building my kernels with Clang's CFI as well once it becomes available for x86_64. It would be nice to be able to continue using LKRG once that happens.

On that note I wonder if compiling LKRG as a builtin would workaround the LTO issue?

solardiz commented 2 years ago

Closing this issue for now... @solardiz any objections?

No objections, I think it's a WONTFIX for now. I'll close.

I wonder if compiling LKRG as a builtin would workaround the LTO issue?

Currently, no, because we use the same hooking mechanism even when LKRG is linked in. We could avoid that, but then we'd need to implement and maintain two kinds of hooking.