lkrg-org / lkrg

Linux Kernel Runtime Guard
https://lkrg.org
Other
410 stars 72 forks source link

Sudo suddenly confusing LKRG #41

Closed sempervictus closed 3 years ago

sempervictus commented 3 years ago

Using b2d193b5ec7cb8aee9, on an Arch box, i'm seeing LKRG cry about sudo invocation spamming dmesg with:

Jan 16 03:04:04 svl-arch00 kernel: [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[678 | sudo] has different 'cred' pointer
Jan 16 03:04:04 svl-arch00 kernel: [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[678 | sudo] has different 'real_cred' pointer
Jan 16 03:04:04 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[678 | sudo] has different EUID! 1000 vs 0
Jan 16 03:04:04 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[678 | sudo] has different SUID! 1000 vs 0
Jan 16 03:04:04 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[678 | sudo] has different FSUID! 1000 vs 0
Jan 16 03:04:04 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[678 | sudo] has different EUID! 1000 vs 0
Jan 16 03:04:04 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[678 | sudo] has different SUID! 1000 vs 0
Jan 16 03:04:04 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[678 | sudo] has different FSUID! 1000 vs 0
Jan 16 03:04:04 svl-arch00 kernel: [p_lkrg] <Exploit Detection> Trying to kill process[sudo | 678]!

Seems like it should recognize sudo, or cloud images which always start with an unpriv user and taking a mad hatter-like approach to root's authorized_keys file via cloud init won't let admins admin. On the bright side, "it sure stops privesc good" :-)

solardiz commented 3 years ago

@sempervictus This indicates that LKRG's update of its shadow copy of process credentials didn't work as intended. It's a bug or incompatibility with your kernel version/build. LKRG isn't supposed to "recognize sudo" specifically, but it's also not supposed to trigger on usage of sudo.

To investigate this, I expect Adam will need to know what kernel version and build/config that was.

solardiz commented 3 years ago

@sempervictus It could also help if you check whether the issue is present with LKRG prior to 6cf9e241bb7f095a4e1bd0f3e9d7fd73203dc0fc and (separately) prior to c049fa569598b277737837a8411b1535a92eaa66. We might also need LKRG load messages from all of these tests - including any warnings about non-hooked functions and such. Thank you!

Adam-pi3 commented 3 years ago

I would also appreciate the information about the full kernel config and CPU architecture.

sempervictus commented 3 years ago

The architecture is x86_64, the kernel is 5.4.89 (with linux-hardened patches). Any specific Kconfig options of interest? I'll try a few manual builds and see how they go

sempervictus commented 3 years ago

So i pulled the repo and reset to 1f9809db4bed4e33fcc3db31270def6458654199, built it against the same 5.4.89 and didn't even get to test sudo - system locked up hard, nothing in dmesg (KVM VM with ttyS0 on virtual console). The KVM host is grsec, didn't report anything either. I rebooted the VM, reset it to d3118e4, built and insmod the product. I did manage to ssh in as vagrant and sudo ls before it also hung hard, so sudo does work on older commits, but the module proper-f's the system into a frozen mess, and whatever is causing it doesnt make it to any console output. Having the stack traces from the current version though is revealing:

[  189.256303] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[532 | sudo] has different 'cred' pointer
[  189.256420] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[532 | sudo] has different 'real_cred' pointer
[  189.256521] [p_lkrg] <Exploit Detection> process[532 | sudo] has different EUID! 1000 vs 0
[  189.256595] [p_lkrg] <Exploit Detection> process[532 | sudo] has different SUID! 1000 vs 0
[  189.256669] [p_lkrg] <Exploit Detection> process[532 | sudo] has different FSUID! 1000 vs 0
[  189.256778] [p_lkrg] <Exploit Detection> process[532 | sudo] has different EUID! 1000 vs 0
[  189.256853] [p_lkrg] <Exploit Detection> process[532 | sudo] has different SUID! 1000 vs 0
[  189.256927] [p_lkrg] <Exploit Detection> process[532 | sudo] has different FSUID! 1000 vs 0
[  189.257001] [p_lkrg] <Exploit Detection> Trying to kill process[sudo | 532]!
[  189.257637] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[532 | sudo] has different 'cred' pointer
[  189.257757] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[532 | sudo] has different 'real_cred' pointer
[  189.257857] [p_lkrg] <Exploit Detection> process[532 | sudo] has different EUID! 1000 vs 0
[  189.257930] [p_lkrg] <Exploit Detection> process[532 | sudo] has different SUID! 1000 vs 0
[  189.258003] [p_lkrg] <Exploit Detection> process[532 | sudo] has different FSUID! 1000 vs 0
[  189.258086] [p_lkrg] <Exploit Detection> process[532 | sudo] has different EUID! 1000 vs 0
[  189.258160] [p_lkrg] <Exploit Detection> process[532 | sudo] has different SUID! 1000 vs 0
[  189.258232] [p_lkrg] <Exploit Detection> process[532 | sudo] has different FSUID! 1000 vs 0
[  189.258307] [p_lkrg] <Exploit Detection> Trying to kill process[sudo | 532]!
[  195.272338] [p_lkrg] <Exploit Detection> Detected ADDR_LIMIT segment corruption! process[552 | sysctl] has different segment address! [7ffffffff000 vs ffffffffffffffff]
[  195.272370] CPU: 3 PID: 552 Comm: sysctl Tainted: G                T 5.4.89 #1
[  195.272370] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
[  195.272370] Call Trace:
[  195.272375]  dump_stack+0x64/0x80
[  195.272411]  p_verify_addr_limit.cold+0x1e/0x28 [p_lkrg]
[  195.272417]  p_generic_permission_entry+0x10a/0x200 [p_lkrg]
[  195.272419]  pre_handler_kretprobe+0xaa/0x1b0
[  195.272420]  opt_pre_handler+0x3b/0x60
[  195.272423]  optimized_callback+0x76/0x90
[  195.272424]  0xffffffffc07eb728
[  195.272426]  ? generic_permission+0x1/0x1b0
[  195.272427]  ? inode_permission.part.0+0x2f/0x180
[  195.272427]  ? link_path_walk+0x85/0x570
[  195.272428]  ? path_openat+0x91/0x1660
[  195.272429]  ? do_filp_open+0x9e/0x140
[  195.272431]  ? filp_open+0xee/0x1a0
[  195.272433]  ? hostid_read+0x50/0x120
[  195.272434]  ? zone_get_hostid+0x2a/0x50
[  195.272434]  ? proc_dohostid+0xf3/0x220
[  195.272436]  ? _cond_resched+0x11/0x40
[  195.272438]  ? __cgroup_bpf_run_filter_sysctl+0xd1/0x2d0
[  195.272440]  ? proc_sys_call_handler.isra.0+0x16c/0x1b0
[  195.272441]  ? vfs_read+0x9b/0x180
[  195.272441]  ? ksys_read+0x5e/0xe0
[  195.272443]  ? do_syscall_64+0x52/0x90
[  195.272444]  ? entry_SYSCALL_64_after_hwframe+0x4f/0xb5
[  195.272445]  ? entry_SYSCALL_64_after_hwframe+0x42/0xb5
[  195.272446] [p_lkrg] <Exploit Detection> Trying to kill process[sysctl | 552]!
[  195.272462] [p_lkrg] <Exploit Detection> Detected ADDR_LIMIT segment corruption! process[552 | sysctl] has different segment address! [7ffffffff000 vs ffffffffffffffff]
[  195.272487] CPU: 3 PID: 552 Comm: sysctl Tainted: G                T 5.4.89 #1
[  195.272488] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
[  195.272488] Call Trace:
[  195.272489]  dump_stack+0x64/0x80
[  195.272493]  p_verify_addr_limit.cold+0x1e/0x28 [p_lkrg]
[  195.272496]  p_generic_permission_entry+0x10a/0x200 [p_lkrg]
[  195.272497]  pre_handler_kretprobe+0xaa/0x1b0
[  195.272498]  opt_pre_handler+0x3b/0x60
[  195.272499]  optimized_callback+0x76/0x90
[  195.272499]  0xffffffffc07eb728
[  195.272501]  ? generic_permission+0x1/0x1b0
[  195.272502]  ? inode_permission.part.0+0x2f/0x180
[  195.272502]  ? link_path_walk+0x85/0x570
[  195.272503]  ? path_openat+0x91/0x1660
[  195.272504]  ? do_filp_open+0x9e/0x140
[  195.272505]  ? filp_open+0xee/0x1a0
[  195.272506]  ? hostid_read+0x50/0x120
[  195.272507]  ? zone_get_hostid+0x2a/0x50
[  195.272508]  ? proc_dohostid+0xf3/0x220
[  195.272509]  ? _cond_resched+0x11/0x40
[  195.272509]  ? __cgroup_bpf_run_filter_sysctl+0xd1/0x2d0
[  195.272510]  ? proc_sys_call_handler.isra.0+0x16c/0x1b0
[  195.272511]  ? vfs_read+0x9b/0x180
[  195.272512]  ? ksys_read+0x5e/0xe0
[  195.272513]  ? do_syscall_64+0x52/0x90
[  195.272514]  ? entry_SYSCALL_64_after_hwframe+0x4f/0xb5
[  195.272515]  ? entry_SYSCALL_64_after_hwframe+0x42/0xb5
[  195.272515] [p_lkrg] <Exploit Detection> Trying to kill process[sysctl | 552]!

Specifically the part about

[  195.272462] [p_lkrg] <Exploit Detection> Detected ADDR_LIMIT segment corruption! process[552 | sysctl] has different segment address! [7ffffffff000 vs ffffffffffffffff]

This kernel has both the linux-hardened patchset with an improved ASLR setup and VMWare PhotonOS' "version" of RANDKSTACK:

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 2ba3d53ac5b1..a1a743dc06f4 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -66,6 +66,15 @@ END(native_usergs_sysret64)
        TRACE_IRQS_FLAGS EFLAGS(%rsp)
 .endm

+#ifdef CONFIG_PAX_RANDKSTACK
+.macro PAX_RAND_KSTACK
+       movq    %rsp, %rdi
+       call    pax_randomize_kstack
+       movq    %rsp, %rdi
+       movq    %rax, %rsp
+.endm
+#endif
+
 /*
  * When dynamic function tracer is enabled it will add a breakpoint
  * to all locations that it is about to modify, sync CPUs, update
@@ -170,9 +179,28 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
        TRACE_IRQS_OFF

        /* IRQs are off. */
+
+       /*
+        * do_syscall_64 expects syscall-nr (pt_regs->orig_ax) as the first
+        * argument (%rdi) and pointer to pt_regs as the second argument (%rsi).
+        */
+#ifdef CONFIG_PAX_RANDKSTACK
+       pushq   %rax
+       movq    %rsp, %rdi
+       call    pax_randomize_kstack
+       popq    %rdi
+       movq    %rsp, %rsi
+       movq    %rax, %rsp
+
+       pushq   %rsi
+#else
        movq    %rax, %rdi
        movq    %rsp, %rsi
+#endif
        call    do_syscall_64           /* returns with IRQs disabled */
+#ifdef CONFIG_PAX_RANDKSTACK
+       popq    %rsp
+#endif

        TRACE_IRQS_IRETQ                /* we're about to change IF */

@@ -340,8 +368,16 @@ ENTRY(ret_from_fork)

 2:
        UNWIND_HINT_REGS
+#ifdef CONFIG_PAX_RANDKSTACK
+       PAX_RAND_KSTACK
+       pushq   %rdi
+#else
        movq    %rsp, %rdi
+#endif
        call    syscall_return_slowpath /* returns with IRQs disabled */
+#ifdef CONFIG_PAX_RANDKSTACK
+       popq    %rsp
+#endif
        TRACE_IRQS_ON                   /* user mode is traced as IRQS on */
        jmp
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index da3cc3a10d63..f1469f184a0d 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -63,6 +63,21 @@

 #include "process.h"

+#ifdef CONFIG_PAX_RANDKSTACK
+unsigned long pax_randomize_kstack(struct pt_regs *regs)
+{
+       unsigned long time;
+       unsigned long sp1;
+
+       if (!randomize_va_space)
+               return (unsigned long)regs;
+
+       time = rdtsc() & 0xFUL;
+       sp1 = (unsigned long)regs - (time << 4);
+       return sp1;
+}
+#endif
+
 /* Prints also some state that isn't saved in the pt_regs */
 void __show_regs(struct pt_regs *regs, enum show_regs_mode mode)
 {

I'm going to pull out the mangled randkstack out and see if that solves the problem, if this is a conflict with linux-hardened however, that would be a real concern.

sempervictus commented 3 years ago

The build's almost wrapped, guessing that's where this will be since LKRG normally works just fine with linux-hardened. I took a look at how the addrlimit verification works, and its pretty straightforward inside LKRG, so i need to dig into how the RANDKSTACK stuff actually impacts it. In the meantime, a dirty hack-around is probably something along the lines of:

+++ w/security/lkrg/modules/exploit_detection/p_exploit_detection.h
@@ -216,6 +216,7 @@ struct p_task_off_debug {
 };
 #endif

+#ifndef CONFIG_PAX_RANDKSTACK
 /* X86(-64)*/
 #if defined(CONFIG_X86) && LINUX_VERSION_CODE < KERNEL_VERSION(5,10,0)
  #define P_VERIFY_ADDR_LIMIT 1
@@ -223,6 +224,7 @@ struct p_task_off_debug {
 #elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
  #define P_VERIFY_ADDR_LIMIT 2
 #endif
+#endif

... if trying to use that code. Will report back when i have confirmation or negation of the hunch.

sempervictus commented 3 years ago

Well, sudo's still broken without RANDKSTACK, but not getting the ADDR_LIMIT error:

Jan 17 00:33:06 svl-arch00 kernel: [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[546 | sudo] has different 'cred' pointer
Jan 17 00:33:06 svl-arch00 kernel: [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[546 | sudo] has different 'real_cred' pointer
Jan 17 00:33:06 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[546 | sudo] has different EUID! 1000 vs 0
Jan 17 00:33:06 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[546 | sudo] has different SUID! 1000 vs 0
Jan 17 00:33:06 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[546 | sudo] has different FSUID! 1000 vs 0
Jan 17 00:33:06 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[546 | sudo] has different EUID! 1000 vs 0
Jan 17 00:33:06 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[546 | sudo] has different SUID! 1000 vs 0
Jan 17 00:33:06 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[546 | sudo] has different FSUID! 1000 vs 0
Jan 17 00:33:06 svl-arch00 kernel: [p_lkrg] <Exploit Detection> Trying to kill process[sudo | 546]!
sempervictus commented 3 years ago

Digging into p_cmp_creds and around it, i'm actually a bit confused as to why this wasnt a problem before - what was keeping sudo and similar from triggering on this before?

solardiz commented 3 years ago

@sempervictus Thank you for helping investigate this. What you're seeing is a symptom of LKRG not updating its shadow credentials where it should. There are many possible causes of that. p_cmp_creds is merely where the issue is seen, not where the incompatibility with your kernel is. Also, sudo isn't special - having worked around just sudo you'd later find that some other program also triggers the issue.

As to why this wasn't happening before, your other testing suggests that something changed not only in LKRG, but also in your kernel, since you say that reverting to older LKRG now results in system hang, and apparently this wasn't the case for you before. Correct?

In LKRG, our changes in what functions we hook for execve are relevant, and could well have broken LKRG's compatibility with grsecurity(-derived) kernel hardening changes. This will need to be re-tested and likely corrected in LKRG if so.

Adam-pi3 commented 3 years ago

Thanks for many useful information. GRSec certainly modifies 'standard' Linux kernel behavior. Thanks for letting us know about the environment where you work on it. Based on that I have a couple of notes:

  1. Can you try loading old LKRG in audit-mode only? E.g. you can set lkrg.profile_enforce=0 or lkrg.profile_enforce=1. I assume that the reason why you have hang is because of the invoked panic()
  2. What do you see when you execute:
    • cat /proc/kallsyms |grep execve
    • cat /proc/kallsyms |grep search_binary_handler
    • cat /proc/kallsyms |grep do_execveat_common

As @solardiz mentioned the problem is that there is a missing 'process update' logic. We need to understand what is non-standard in the kernel which changes the default behavior and how it handles execution.

sempervictus commented 3 years ago

To be clear - this isn't a grsec kernel by any means, those are in a private tree, don't go into any public repo, etc etc. They also dont need LKRG :). The RANDKSTACK implementation looks to have been extracted from a much older version of their patches, the current one looks nothing like it.

To ease debugging, i've pulled that out of the equation entirely for now to help track down what's going on. The base kernel underpinning LKRG is 5.4.89.a from linux-hardened. Thats the patchset i've had under every LKRG build (we dont even have a vanilla buildbot in our clouds - its all either this or private stuff)

I'm going to try building the older modules and loading them with enforce=0 to see what happens while the build runs, will report findings shortly.

sempervictus commented 3 years ago

Eh, that was quick and dirty - using d3118e4 even with insmod output/p_lkrg.ko lkrg.profile_enforce=0 causes a silent hang ~5-10s after loading the module. Testing with an older 5.4.83 build now.

Just curious, but do you guys build/test with RANDSTRUCT? I've seen prior cases of upstream+their randstruct having some "issues" (IIRC the well-known case was something in NFS getting a VLA splattered into the middle of the struct, which shouldn't be a problem anymore, but other things might get wonky).

sempervictus commented 3 years ago

Ok, so finally got output before d3118e4 hung on 5.4.89-hardened:

[  116.358473][   T95] [p_lkrg] ALERT !!! MODULE KOBJ HASH IS DIFFERENT !!! - it is [0xdea676f1d4511177] and should be [0xd9818db8f971ffc3] !!!
[  116.360065][   T95] [p_lkrg] ALERT !!! SYSTEM HAS BEEN COMPROMISED - DETECTED DIFFERENT 1 CHECKSUMS !!!
[  116.361150][   T95] Kernel panic - not syncing: [p_lkrg] Kernel Integrity verification failed! Killing the kernel...
[  116.362418][   T95] Kernel Offset: 0x1000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  116.363640][   T95] Rebooting in 120 seconds..

^^ thats built at runtime, then loaded, so system's been up for quite a while.

sempervictus commented 3 years ago

For the current module (using that grungy sudo bypass to run for now), i get:

[root@svl-arch00 ~]# cat /proc/kallsyms |grep execve
0000000000000000 t audit_log_execve_info
0000000000000000 t __do_execve_file
0000000000000000 T __ia32_compat_sys_execve
0000000000000000 T __ia32_compat_sys_execveat
0000000000000000 T do_execve_file
0000000000000000 T do_execve
0000000000000000 T __ia32_sys_execve
0000000000000000 T __x64_sys_execve
0000000000000000 T do_execveat
0000000000000000 T __ia32_sys_execveat
0000000000000000 T __x64_sys_execveat
0000000000000000 d _eil_addr___ia32_compat_sys_execveat
0000000000000000 d _eil_addr___ia32_compat_sys_execve
0000000000000000 d _eil_addr___ia32_sys_execveat
0000000000000000 d _eil_addr___x64_sys_execveat
0000000000000000 d _eil_addr___ia32_sys_execve
0000000000000000 d _eil_addr___x64_sys_execve
[root@svl-arch00 ~]# cat /proc/kallsyms |grep search_binary_handler
0000000000000000 t search_binary_handler.part.0
0000000000000000 T search_binary_handler
0000000000000000 r __ksymtab_search_binary_handler
0000000000000000 r __kstrtab_search_binary_handler
0000000000000000 t p_search_binary_handler_ret.cold [p_lkrg]
0000000000000000 d p_search_binary_handler_kretprobe    [p_lkrg]
0000000000000000 t p_install_search_binary_handler_hook [p_lkrg]
0000000000000000 t p_uninstall_search_binary_handler_hook   [p_lkrg]
0000000000000000 t p_search_binary_handler_entry    [p_lkrg]
0000000000000000 t p_search_binary_handler_ret  [p_lkrg]
0000000000000000 b p_search_binary_handler_kretprobe_state  [p_lkrg]
[root@svl-arch00 ~]# cat /proc/kallsyms |grep do_execveat_common
[root@svl-arch00 ~]# 
sempervictus commented 3 years ago

Confirm that this is happening with Arch Linux' upstream linux-lts package - a 5.4.89 sans the linux-hardened patchset. Also confirm that with THAT kernel, LKRG @ master straight-up hangs the system the same way older builds hung our hardened-patched kernels (well, same symptoms, no clue why either is happening for lack of stack traces):

image

sempervictus commented 3 years ago

5.4.90 did not help either

Adam-pi3 commented 3 years ago

My bad, try using kint_enforce=0 or kint_enforce=1

Btw. Can you point me which exact environment and configuration are you using so I can set-up a VM and do some tests? Unless you could give me # access to one of the 'junk' VMs where you encounter that problem (faster and easier for me but I understand if you can't :))

sempervictus commented 3 years ago

Will test with those options later tonight. Far as an env - just grab any arch image and install the lts kernel packages with devel tools.

0xC0ncord commented 3 years ago

I want to drop a line here and add that I have LKRG running on kernel 5.10.7 with (and only with) linux-hardened, so I do not think that linux-hardened is what may be causing problems here.

sempervictus commented 3 years ago

Yeah, agreed, this is likely something to do with 5.4 since i can reproduce it on the official linux-lts package which has none of our patches.

@Adam-pi3 - looks like the enforcement option isn't helping with the sudo bit either:

[root@svl-arch00 ~]# kint_enforce=0 pint_enforce=0 modprobe p_lkrg
[  260.958660][  T804] [p_lkrg] Loading LKRG...
[  261.219502][  T804] [p_lkrg] [kretprobe] register_kretprobe() for ovl_create_or_link failed and ISRA / CONSTPROP version not found!
[  261.409826][  T804] [p_lkrg] LKRG initialized successfully!
[root@svl-arch00 ~]# [  269.945851][  T812] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[812 | sudo] has different 'cred' pointer
[  269.949652][  T812] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[812 | sudo] has different 'real_cred' pointer
[  269.953230][  T812] [p_lkrg] <Exploit Detection> process[812 | sudo] has different EUID! 1000 vs 0
[  269.955117][  T812] [p_lkrg] <Exploit Detection> process[812 | sudo] has different SUID! 1000 vs 0
[  269.957055][  T812] [p_lkrg] <Exploit Detection> process[812 | sudo] has different FSUID! 1000 vs 0
[  269.958939][  T812] [p_lkrg] <Exploit Detection> process[812 | sudo] has different EUID! 1000 vs 0
[  269.960835][  T812] [p_lkrg] <Exploit Detection> process[812 | sudo] has different SUID! 1000 vs 0
[  269.962803][  T812] [p_lkrg] <Exploit Detection> process[812 | sudo] has different FSUID! 1000 vs 0
[  269.964925][  T812] [p_lkrg] <Exploit Detection> Trying to kill process[sudo | 812]!
[  269.967614][  T812] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[812 | sudo] has different 'cred' pointer
[  269.972477][  T812] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[812 | sudo] has different 'real_cred' pointer
[  269.977101][  T812] [p_lkrg] <Exploit Detection> process[812 | sudo] has different EUID! 1000 vs 0
[  269.979922][  T812] [p_lkrg] <Exploit Detection> process[812 | sudo] has different SUID! 1000 vs 0
[  269.982522][  T812] [p_lkrg] <Exploit Detection> process[812 | sudo] has different FSUID! 1000 vs 0
[  269.985165][  T812] [p_lkrg] <Exploit Detection> process[812 | sudo] has different EUID! 1000 vs 0
[  269.987461][  T812] [p_lkrg] <Exploit Detection> process[812 | sudo] has different SUID! 1000 vs 0
[  269.989598][  T812] [p_lkrg] <Exploit Detection> process[812 | sudo] has different FSUID! 1000 vs 0
[  269.992029][  T812] [p_lkrg] <Exploit Detection> Trying to kill process[sudo | 812]!

and when building d3118e4 against linux-lts package from Arch upstream and loading with kint_enforce=0 insmod output/p_lkrg.so while also having lkrg.kint_enforce=0 on the kernel commandline still results in a hard hang of the VM ~10s after loading the module.

When i built the current (as of right now) master against linux-lts it also crashed ~10s into the run (which is weird as the patched kernel doesnt do that), and also killed sudo attempts:

Jan 17 18:29:30 svl-arch00 kernel: [p_lkrg] LKRG initialized successfully!
Jan 17 18:29:30 svl-arch00 kernel: OOM killer enabled.
Jan 17 18:29:30 svl-arch00 kernel: Restarting tasks ... done.
Jan 17 18:29:40 svl-arch00 kernel: [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[2089 | sudo] has different 'cred' pointer
Jan 17 18:29:40 svl-arch00 kernel: [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[2089 | sudo] has different 'real_cred' pointer
Jan 17 18:29:40 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[2089 | sudo] has different EUID! 1000 vs 0
Jan 17 18:29:40 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[2089 | sudo] has different SUID! 1000 vs 0
Jan 17 18:29:40 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[2089 | sudo] has different FSUID! 1000 vs 0
Jan 17 18:29:40 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[2089 | sudo] has different EUID! 1000 vs 0
Jan 17 18:29:40 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[2089 | sudo] has different SUID! 1000 vs 0
Jan 17 18:29:40 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[2089 | sudo] has different FSUID! 1000 vs 0
Jan 17 18:29:40 svl-arch00 kernel: [p_lkrg] <Exploit Detection> Trying to kill process[sudo | 2089]!
Jan 17 18:29:40 svl-arch00 kernel: [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[2089 | sudo] has different 'cred' pointer
Jan 17 18:29:40 svl-arch00 kernel: [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[2089 | sudo] has different 'real_cred' pointer
Jan 17 18:29:40 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[2089 | sudo] has different EUID! 1000 vs 0
Jan 17 18:29:40 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[2089 | sudo] has different SUID! 1000 vs 0
Jan 17 18:29:40 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[2089 | sudo] has different FSUID! 1000 vs 0
Jan 17 18:29:40 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[2089 | sudo] has different EUID! 1000 vs 0
Jan 17 18:29:40 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[2089 | sudo] has different SUID! 1000 vs 0
Jan 17 18:29:40 svl-arch00 kernel: [p_lkrg] <Exploit Detection> process[2089 | sudo] has different FSUID! 1000 vs 0
Jan 17 18:29:40 svl-arch00 kernel: [p_lkrg] <Exploit Detection> Trying to kill process[sudo | 2089]!
Jan 17 18:29:45 svl-arch00 kernel: [p_lkrg] ALERT !!! _STEXT MEMORY BLOCK HASH IS DIFFERENT - it is [0x5499e25e039824f2] and should be [0x475c94803b707799] !!!
Jan 17 18:29:45 svl-arch00 kernel: [p_lkrg] ALERT !!! SYSTEM HAS BEEN COMPROMISED - DETECTED DIFFERENT 1 CHECKSUMS !!!

and this is still with kint_enforce=0 set during insmod and on command line and in the modprobe config options. Since we're seeing this behavior with _STEXT and KOBJ on official LTS kernels, i think there's some sort of more generic race occurring.

Adam-pi3 commented 3 years ago

I've installed Arch VM with linux-lts. I've more visibility what's going on:

  1. _STEXT violation - LKRG detect modification of 2 functions ftrace_modify_all_code and ftrace_enable_sysctl. LKRG places hooks on these 2 functions which is by default KPROBE hook. However, by default KPROBE optimization is enabled (debug.kprobes-optimization=1) which modifies original KPROBE (0xcc) into FTRACE (jmp instruction) after LKRG builds database. You can avoid having this problem by setting debug.kprobes-optimization=0. This is very weird, since optimization should happen during KPROBE installation. Most likely some non-standard race is in that kernel which i need to investigate more.
  2. LKRG's hook on search_binary_handler is never executed(!) and because of that LKRG do not update shadow credentials and you see problems with SUID binaries. This commit (https://github.com/openwall/lkrg/commit/1299583b564eeac5f9eec0b74ca59224d8c2498e) replaces execve syscall hooks with search_binary_handler. exec syscall at some point does invoke search_binary_handler but in that specific LTS kernel it does not. I would need to investigate what's going on and what is happening at the binary level
  3. Problem with KOBJ which you can see in old LKRG commit is most likely related to that issue: https://github.com/openwall/lkrg/issues/38. In short, problematic kernel changes were backported to LTS kernels and you need that specific LKRG's commit (https://github.com/openwall/lkrg/commit/8814ebe8043d3ff4d7efbb00baacebe6a73bd8f4) to avoid that problem.

@sempervictus I'm new to arch linux distro. Is it possible to get linux-lts kernel debug symbols and linux-lts source packages so I could look at the vmlinux binary and appropriate kernel sources?

solardiz commented 3 years ago

LKRG's hook on search_binary_handler is never executed(!)

Notice search_binary_handler.part.0 in one of @sempervictus' comments. Maybe that function got split in two by gcc? Would be tough to hook reliably across builds, then. Even worse is that this might happen for other functions in other builds.

[root@svl-arch00 ~]# kint_enforce=0 pint_enforce=0 modprobe p_lkrg

@sempervictus That's not how you pass module parameters. Instead, use e.g.:

modprobe p_lkrg kint_enforce=0 pint_enforce=0

Anyway, this isn't currently needed, as you've already provided sufficient information from LKRG logs. Now Adam just needs some help with Arch specific kernel packages, see his edited comment above.

Adam-pi3 commented 3 years ago

@solardiz I've verified that when we place hook at search_binary_handler.part.0 it solves the problem. Per https://unix.stackexchange.com/questions/223013/function-symbol-gets-part-suffix-after-compilation:

Sometimes, GCC evaluates that a some part of the control flow of a big function could esily be inlined, but that it would not be okay to inline the entire huge function. Therefore, it splits the function to put the big part in its own function, which receives as a name the original function name plus .part + .<some number>, and inlines the rest in other functions.

and later:

You can use the option -fdisable-ipa-fnsplit to prevent the compiler from applying this optimization, or -fenable-ipa-fnsplit to enable it. By default, it's applied at optimization levels -O2 and -O3 and disabled otherwise.

I can see only a few path how we can handle it (none of them are good):

  1. Support only specifically verified kernel versions
  2. Have some direct connection with each distros and verify / customize it per requested kernels
  3. Only guarantee functionality when such crazy aggressive optimization are not enabled
  4. Revert our execve* hook to the syscall version (it can't be optimized)
  5. Have some pre-installation verification script which parses kallsyms and generate a warning when such optimization happened on the function which LKRG hooks
  6. Don't know... I'm open for any suggestion

Regarding this:

Even worse is that this might happen for other functions in other builds.

Correct. However, majority of our hooks are on the functions which can't be inlined. We should me more safe than it sounds...

sempervictus commented 3 years ago

Far as "aggressive optimization" goes - the kernel we build here is -O3 but -O2 is pretty common nowadays. The sources are @ https://github.com/archlinux/svntogit-packages/tree/packages/linux-lts/trunk - pull that, use makepkg to build. Arch is very well documented, including its build process (part of the reason we dropped our fork of ubuntu a few years ago).

In terms of what to do next... is it possible to detect the split inlined function and walk the indirection? If not, then probably sticking to un-inlined functions would be safer. Would it help to add __attribute__ ((noinline)) to search_binary_handler in the in-tree script? If so, what else should i tag up that way?

sempervictus commented 3 years ago

Taking a stab at

diff --git i/fs/Makefile w/fs/Makefile
index 0f4c675caf07..c9536b35cf94 100644
--- i/fs/Makefile
+++ w/fs/Makefile
@@ -6,6 +6,7 @@
 # Rewritten to use lists instead of if-statements.
 # 

+CFLAGS_exec.o = -disable-ipa-fnsplit
 obj-y :=       open.o read_write.o file_table.o super.o \
                char_dev.o stat.o exec.o pipe.o namei.o fcntl.o \
                ioctl.o readdir.o select.o dcache.o inode.o

If ^^ works, we can include the diffs in the in-tree patch, but still problematic for the average consumer using a DKMS package out of the AUR or PPAs/RPM repos.

sempervictus commented 3 years ago

Well, was worth a shot. No dice on the CFLAGS try - sudo still not working correctly.

sempervictus commented 3 years ago

Trying it as straight-up noinline but apparently the CFLAGS approach should be more like

CFLAGS_exec.o := $(call cc-option,-fdisable-ipa-fnsplit)
sempervictus commented 3 years ago

the noinline trick works:

[root@svl-arch00 ~]# cat /proc/kallsyms |grep search_binary_handler
0000000000000000 T search_binary_handler
0000000000000000 r __ksymtab_search_binary_handler
0000000000000000 r __kstrtab_search_binary_handler
[root@svl-arch00 ~]# systemctl start lkrg.service 
[   20.111647][  T492] [p_lkrg] Loading LKRG...
[   20.352036][  T492] [p_lkrg] [kretprobe] register_kretprobe() for ovl_create_or_link failed and ISRA / CONSTPROP version not found!
[   20.628125][  T492] [p_lkrg] LKRG initialized successfully!
[root@svl-arch00 ~]# 

and i can sudo correctly. So i guess that's a start... Taking a stab at the CFLAGS approach above. Either way though, probably not practical when the intent is to ship as a module for already built kernels.

sempervictus commented 3 years ago

Looks like its noinline or bust -

diff --git i/fs/Makefile w/fs/Makefile
index 0f4c675caf07..bf885387f474 100644
--- i/fs/Makefile
+++ w/fs/Makefile
@@ -5,7 +5,7 @@
 # 14 Sep 2000, Christoph Hellwig <hch@infradead.org>
 # Rewritten to use lists instead of if-statements.
 # 
-
+CFLAGS_exec.o := $(call cc-option,-fdisable-ipa-fnsplit)
 obj-y :=       open.o read_write.o file_table.o super.o \
                char_dev.o stat.o exec.o pipe.o namei.o fcntl.o \
                ioctl.o readdir.o select.o dcache.o inode.o \

still produces

[root@svl-arch00 ~]# systemctl start lkrg.service 
[   10.569314][  T592] [p_lkrg] Loading LKRG...
[   10.715596][  T592] [p_lkrg] [kretprobe] register_kretprobe() for ovl_create_or_link failed and ISRA / CONSTPROP version not found!
[   10.885887][  T592] [p_lkrg] LKRG initialized successfully!
[root@svl-arch00 ~]# cat /proc/kallsyms |grep search_binary_handler
0000000000000000 t search_binary_handler.part.0
0000000000000000 T search_binary_handler
0000000000000000 r __ksymtab_search_binary_handler
0000000000000000 r __kstrtab_search_binary_handler
0000000000000000 t p_search_binary_handler_ret.cold [p_lkrg]
0000000000000000 d p_search_binary_handler_kretprobe    [p_lkrg]
0000000000000000 t p_install_search_binary_handler_hook [p_lkrg]
0000000000000000 t p_uninstall_search_binary_handler_hook   [p_lkrg]
0000000000000000 t p_search_binary_handler_entry    [p_lkrg]
0000000000000000 t p_search_binary_handler_ret  [p_lkrg]
0000000000000000 b p_search_binary_handler_kretprobe_state  [p_lkrg]
[root@svl-arch00 ~]# [   66.799738][  T604] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[604 | sudo] has different 'cred' pointer
[   66.804775][  T604] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[604 | sudo] has different 'real_cred' pointer
[   66.806098][  T604] [p_lkrg] <Exploit Detection> process[604 | sudo] has different EUID! 1000 vs 0
[   66.806966][  T604] [p_lkrg] <Exploit Detection> process[604 | sudo] has different SUID! 1000 vs 0
[   66.807922][  T604] [p_lkrg] <Exploit Detection> process[604 | sudo] has different FSUID! 1000 vs 0
[   66.809105][  T604] [p_lkrg] <Exploit Detection> process[604 | sudo] has different EUID! 1000 vs 0
[   66.810048][  T604] [p_lkrg] <Exploit Detection> process[604 | sudo] has different SUID! 1000 vs 0
[   66.810954][  T604] [p_lkrg] <Exploit Detection> process[604 | sudo] has different FSUID! 1000 vs 0
[   66.811920][  T604] [p_lkrg] <Exploit Detection> Trying to kill process[sudo | 604]!
[   66.813115][  T604] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[604 | sudo] has different 'cred' pointer
[   66.814941][  T604] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[604 | sudo] has different 'real_cred' pointer
[   66.816526][  T604] [p_lkrg] <Exploit Detection> process[604 | sudo] has different EUID! 1000 vs 0
[   66.817940][  T604] [p_lkrg] <Exploit Detection> process[604 | sudo] has different SUID! 1000 vs 0
[   66.818974][  T604] [p_lkrg] <Exploit Detection> process[604 | sudo] has different FSUID! 1000 vs 0
[   66.820063][  T604] [p_lkrg] <Exploit Detection> process[604 | sudo] has different EUID! 1000 vs 0
[   66.821506][  T604] [p_lkrg] <Exploit Detection> process[604 | sudo] has different SUID! 1000 vs 0
[   66.822671][  T604] [p_lkrg] <Exploit Detection> process[604 | sudo] has different FSUID! 1000 vs 0
[   66.823974][  T604] [p_lkrg] <Exploit Detection> Trying to kill process[sudo | 604]!

either i'm doing something wrong in setting the CFLAG or it just doesnt work. Either way, not ideal.

solardiz commented 3 years ago

Revert our execve* hook to the syscall version (it can't be optimized)

I think we have to do that for now. Back when we were discussing switching to search_binary_handler that was an exported symbol, but by the time we made the change it no longer was (and apparently that change is now getting backported to stable kernels). Then we introduced hooking of do_execveat_common when search_binary_handler isn't available, but this still has the risk of both being unavailable (in this issue, we see one partially inlined and the other apparently fully inlined; in #45 apparently both are inlined), in which case LKRG will (at best) fail to load.

So I suggest we cleanly revert all related commits (IIRC, 3 of them?) in reverse order. Technically, we could probably revert just one 1299583b564eeac5f9eec0b74ca59224d8c2498e, but that would leave unused files around, which I don't like - it's sufficient that we have those saved in git history, in case we need them later.

Have some pre-installation verification script which parses kallsyms and generate a warning when such optimization happened on the function which LKRG hooks

We could do this checking from the LKRG kernel module itself, but this doesn't actually solve the problem.

sempervictus commented 3 years ago

Far as "solving the problem" goes, I've no clue how to talk to Linus about this stuff in a productive manner. He seems actively hostile to code not under his direct control and has damaged downstream consumers for years with his capricious GPL-only export BS and other changes which he then backports to supposedly stable branches (which subjectively looks like a petty attempt to assert one's "authority"). For us, this is less a problem than for most since we build in-house and our own kit runs on kernels patched by people who both care about security like pious monks and comprehend what's happening in the kernel far better than him (even from a sensory deprivation tank getting diffs by morse code in UTF16). We can revert his deletions of exports and move along happily so far as autoconf or whatever finds the symbol correctly (or we mangle it to know what we've done), but people using distro kernels cannot.

Adam-pi3 commented 3 years ago

So I suggest we cleanly revert all related commits (IIRC, 3 of them?) in reverse order. Technically, we could probably revert just one 1299583, but that would leave unused files around, which I don't like - it's sufficient that we have those saved in git history, in case we need them later.

@solardiz One more option might be to directly hook all binary handlers:

fs/binfmt_aout.c, line 39
fs/binfmt_elf.c, line 102
fs/binfmt_elf_fdpic.c, line 83
fs/binfmt_em86.c, line 94
fs/binfmt_flat.c, line 98
fs/binfmt_misc.c, line 808
fs/binfmt_script.c, line 142

However, I'm not sure if we should do it now or in v.next. What do you think?

@sempervictus thank you very much for all you tests!

solardiz commented 3 years ago

One more option might be to directly hook all binary handlers:

Although we mentioned it in e-mails before, I currently think this doesn't help: it's even more functions to hook yet not making the race window for exploits much smaller. Switching to hooking search_binary_handler at least simplified things, if it worked.

I took another look, and there are other opportunities. The binary handlers have calls back into shared exec functions, and some of those are exported. Most relevant looks begin_new_exec. The commit_creds call is inside of it. Even better: there are calls to security_bprm_committing_creds and security_bprm_committed_creds around that call! So it looks like we can simply use these two security hooks, either via kprobes/ftrace or the way they were intended to be hooked, to do our pre-exec and post-exec magic. This would also greatly shorten the race window that exploits have to win to bypass LKRG. Looks like a win-win to me. If this is as good as it looks, then let's just do this now, no need to postpone it for v.next.

In older kernels there was install_exec_creds, also exported, which since became part of begin_new_exec. We could have hooked that one and have similarly short race window. That function also uses those two security hooks, so hooking them instead should work across the whole range of kernel versions we support.

However:

Besides the race window length there's another aspect: what if an exploit calls e.g. install_exec_creds or begin_new_exec directly like they tend to call commit_creds? If we were not worried about such possibilities, we'd have simply hooked commit_creds and be updating our shadow creds there, but since exploits use that function this would (sort of) let them bypass LKRG without further effort (actually, they might also bump into pCFI and SMEP). This is why we hook the lots of individual syscalls and functions instead, to update our shadow credentials only when higher-level logic dictates that credentials should be updated.

So the question is whether what we'd hook is a lot more effort and less reliable for exploits to abuse or not.

begin_new_exec would probably have significant side effects if abused in exploits, although not necessarily to the extent where that would stop them. (Side-effects problematic for exploits would be more likely for abuse of even higher-level functions like search_binary_handler.)

install_exec_creds is such a trivial wrapper around commit_creds that it's probably about as easy and reliable to abuse by exploits as commit_creds is. That's bad. No matter whether we hook install_exec_creds itself or the two security hooks, exploits will get our shadow creds updated to what they need easily. But only exploits targeting older kernels (since on newer ones it's the larger begin_new_exec) and only exploits trying to bypass LKRG (since there's no reason for them not to call commit_creds directly otherwise).

Unfortunately, this is still not the entire story. The security_bprm_committing_creds call and further logic in begin_new_exec (roughly what was in install_exec_creds before) is near the end of begin_new_exec. So exploits can jump/call/return right into that place in begin_new_exec, bypassing most of the side-effects. They'd still need to have a suitable pointer in whatever register the me variable is in, or maybe in all otherwise unused registers in order to have a higher chance of success across varying kernel builds, but that's only a slight complication compared to using commit_creds directly.

So we have a security hardening tradeoff - much longer race window vs. much shorter the race window but an easier to abuse function. It isn't clear to me which is preferable. Ideally, we'd find some other option, although the problem does seem rather fundamental - it isn't by pure chance that we have it here.

Not having a better option, we could choose based on simplicity and expected reliability across kernel versions and builds. If so, I think using the two security hooks wins over hooking the many versions of exec syscalls or the many binary handlers.

With that, maybe we should also (later, separately) revisit simply hooking commit_creds and updating our shadow creds there, which means we'd rely on pCFI and SMEP. Indeed, a rather weak defense, but it'd allow us to simplify LKRG greatly. And it might not be that much weaker than what we'd have implemented for exec here (either a comparably easy to abuse function, or a very lengthy race window like we have now). Alternatively, we could move in the opposite direction - duplicating the kernel's computation of new process credentials in LKRG instead of blindly accepting the kernel's (and hopefully not the exploit's) credentials updates in some authorized places like we do now. Arguably, LKRG is currently inconsistent in the level of protection it provides, incurring an unjustified cost in terms of its complexity. (I think I did bring this argument in private e-mail before.)

solardiz commented 3 years ago

If we hook security_bprm_committing_creds now, we could later harden this for v.next by checking that the credentials in bprm passed into this function (and available to us portably if we use it as a security hook the way it was meant to be used, not hook it via kprobes) still match what they were on security_bprm_creds_from_file (which we'd then hook on recent kernels) or security_bprm_set_creds (older kernels). Looks like a working approach to me now.

Adam-pi3 commented 3 years ago

If we hook security_bprm_committing_creds now, we could later harden this for v.next by checking that the credentials in bprm passed into this function

We could do the same via kprobe interface as well (extract the pointer from the passed arguments). The reason why I think it's better approach since we would not have extra dependencies comparing to what we have now.

About the hardening, another approach might be to verify if the hook was invoke when the last IP (instruction pointer) was pointing to the legit upper-level function (e.g. points to the address in the symbol range of begin_new_exec function in newer kernel)

solardiz commented 3 years ago

We could do the same via kprobe interface as well (extract the pointer from the passed arguments).

Yes, but we'd need per-arch code for that.

verify if the hook was invoke when the last IP (instruction pointer) was pointing to the legit upper-level function

Right, we could have special cases where our CFI man wouldn't be nearly as poor, although this would only deal with some kinds of pCFI bypasses.

Anyway, all of this is for future hardening. We need to test that LKRG works reliably across all of our supported kernels with these hooks first.

solardiz commented 3 years ago

I now see that we actually discussed (ab)using install_exec_creds in the reducing execve() p_off race window private e-mail thread in May 2020, but not the security hooks nor the newer kernels. Yet there were also some ideas in that thread beyond what's in my lengthy comment above. We'll need to revisit it for v.next.

sempervictus commented 3 years ago
So we have a security hardening tradeoff - much longer race window vs. much shorter the race window but an easier to abuse function.

In that context, i'd say take the longer race window. A known target on a short window is worse IMO as viable windows can be variable based on adjacent conditions, and a solid jump target without real CFI is going to become a standard ROP gadget builder in every framework out there when LKRG gains more traction (Metasploit already looks for it).

Adam-pi3 commented 3 years ago

I've pushed a few commits which should address a few issues discussed here. The most important one is the exec* one (commit https://github.com/openwall/lkrg/commit/d3276d45e7631288d7d7f060c315e1228160c560). I've done a lot of various verification and it looks correct and a stable change. I've run various test exploits and they are correctly detected. I've verified various kernels and distros and they seems fine. Additionally, I've run @jollheef out-of-tree and 116 kernels for Ubuntu 18.04 correctly compiled and load LKRG.

Additionally, I've addressed kprobe-optimizer issue. On my test arch Linux environment it works fine. However, I would appreciate if @sempervictus and @0xC0ncord (and maybe other interested people) could verify it as well.

jvoisin commented 3 years ago

I'm able to trigger this issue when updating man pages, and when using snap:

[52819.536365] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[75554 | snap-confine] has different 'cred' pointer
[52819.536368] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[75554 | snap-confine] has different 'real_cred' pointer
[52819.536369] [p_lkrg] <Exploit Detection> process[75554 | snap-confine] has different EUID! 1000 vs 0
[52819.536370] [p_lkrg] <Exploit Detection> process[75554 | snap-confine] has different SUID! 1000 vs 0
[52819.536371] [p_lkrg] <Exploit Detection> process[75554 | snap-confine] has different FSUID! 1000 vs 0
[52819.536372] [p_lkrg] <Exploit Detection> process[75554 | snap-confine] has different EUID! 1000 vs 0
[52819.536372] [p_lkrg] <Exploit Detection> process[75554 | snap-confine] has different SUID! 1000 vs 0
[52819.536373] [p_lkrg] <Exploit Detection> process[75554 | snap-confine] has different FSUID! 1000 vs 0
[52819.536374] [p_lkrg] <Exploit Detection> Trying to kill process[snap-confine | 75554]!
[…]
[59521.609846] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[89686 | mandb] has different 'cred' pointer
[59521.609850] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[89686 | mandb] has different 'real_cred' pointer
[59521.609851] [p_lkrg] <Exploit Detection> process[89686 | mandb] has different SUID! 0 vs 6
[59521.609853] [p_lkrg] <Exploit Detection> process[89686 | mandb] has different SGID! 0 vs 12
[59521.609854] [p_lkrg] <Exploit Detection> process[89686 | mandb] has different SUID! 0 vs 6
[59521.609855] [p_lkrg] <Exploit Detection> process[89686 | mandb] has different SGID! 0 vs 12
[59521.609857] [p_lkrg] <Exploit Detection> Trying to kill process[mandb | 89686]!
sempervictus commented 3 years ago

Confirm that current master fixed the sudo issue on 5.4.90 (with hardened patches + randkstack-like-thing with -O3 and Graysky's GCC patches):

[vagrant@svl-arch00 ~]$ sudo dmesg -T|tail
[Tue Jan 19 11:33:08 2021] [p_lkrg] Loading LKRG...
[Tue Jan 19 11:33:08 2021] Freezing user space processes ... (elapsed 0.007 seconds) done.
[Tue Jan 19 11:33:08 2021] OOM killer disabled.
[Tue Jan 19 11:33:08 2021] [p_lkrg] [kretprobe] register_kretprobe() for <ovl_create_or_link> failed! [err=-22]
[Tue Jan 19 11:33:08 2021] [p_lkrg] Trying to find ISRA / CONSTPROP name for <ovl_create_or_link>
[Tue Jan 19 11:33:08 2021] [p_lkrg] [kretprobe] register_kretprobe() for ovl_create_or_link failed and ISRA / CONSTPROP version not found!
[Tue Jan 19 11:33:08 2021] [p_lkrg] Can't hook 'ovl_create_or_link' function. This is expected if you are not using OverlayFS.
[Tue Jan 19 11:33:08 2021] [p_lkrg] LKRG initialized successfully!
[Tue Jan 19 11:33:08 2021] OOM killer enabled.
[Tue Jan 19 11:33:08 2021] Restarting tasks ... done.
[vagrant@svl-arch00 ~]$ 
solardiz commented 3 years ago

@jvoisin Are you getting this with the latest LKRG as of today? On that exact Ubuntu kernel you mentioned in #45? If so, that's bad news - we were hoping we fixed that issue for your setup as well. Please confirm, and I guess Adam will then take look at what happens with your specific kernel build (assuming it's a publicly available one).

jvoisin commented 3 years ago

On the latest lkrg ( 1a72c11 ), on a vanilla Ubuntu kernel ( Linux grimhilde 5.8.0-38-generic #43-Ubuntu SMP Tue Jan 12 12:42:13 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux ), yes.

solardiz commented 3 years ago

@jvoisin Thanks. Does sudo work on that system without triggering the issue, though?

jvoisin commented 3 years ago

Unfortunately not:

[78548.915003] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[99958 | sudo] has different 'cred' pointer
[78548.915007] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[99958 | sudo] has different 'real_cred' pointer
[78548.915009] [p_lkrg] <Exploit Detection> process[99958 | sudo] has different EUID! 1000 vs 0
[78548.915011] [p_lkrg] <Exploit Detection> process[99958 | sudo] has different SUID! 1000 vs 0
[78548.915013] [p_lkrg] <Exploit Detection> process[99958 | sudo] has different FSUID! 1000 vs 0
[78548.915014] [p_lkrg] <Exploit Detection> process[99958 | sudo] has different EUID! 1000 vs 0
[78548.915016] [p_lkrg] <Exploit Detection> process[99958 | sudo] has different SUID! 1000 vs 0
[78548.915017] [p_lkrg] <Exploit Detection> process[99958 | sudo] has different FSUID! 1000 vs 0
[78548.915019] [p_lkrg] <Exploit Detection> Trying to kill process[sudo | 99958]!
[78548.915033] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[99958 | sudo] has different 'cred' pointer
[78548.915034] [p_lkrg] <Exploit Detection> Detected pointer swapping attack!process[99958 | sudo] has different 'real_cred' pointer
[78548.915036] [p_lkrg] <Exploit Detection> process[99958 | sudo] has different EUID! 1000 vs 0
[78548.915037] [p_lkrg] <Exploit Detection> process[99958 | sudo] has different SUID! 1000 vs 0
[78548.915039] [p_lkrg] <Exploit Detection> process[99958 | sudo] has different FSUID! 1000 vs 0
[78548.915040] [p_lkrg] <Exploit Detection> process[99958 | sudo] has different EUID! 1000 vs 0
[78548.915042] [p_lkrg] <Exploit Detection> process[99958 | sudo] has different SUID! 1000 vs 0
[78548.915043] [p_lkrg] <Exploit Detection> process[99958 | sudo] has different FSUID! 1000 vs 0
[78548.915044] [p_lkrg] <Exploit Detection> Trying to kill process[sudo | 99958]!
solardiz commented 3 years ago

Actually, that's better - should be easier for us to analyze another variation of the same issue rather than something differing more significantly. Thanks.

Adam-pi3 commented 3 years ago

This should be fixed by https://github.com/openwall/lkrg/commit/e43d2dd525f014388c1f8cc0eb8a23f2ef07f415. I've made a mistake during migration to the new hooks. Function security_bprm_committed_creds does not return any value (void) but our old logic needed to verify return code to correctly handle exec* case. Looks like that in that specific kernel compilation %rax register kept some trash value after returning from security_bprm_committed_creds. This was incorrectly taken into account and resulted in FP. What is interesting that none of the 116+ kernel which I've tested had that behavior (trash in %rax) :)

jvoisin commented 3 years ago

I confirm that the issue is now gone :) Why wasn't the type mismatch caught by the compiler?

solardiz commented 3 years ago

I confirm that the issue is now gone :)

Great. I'll close the issue now.

Why wasn't the type mismatch caught by the compiler?

Because it wasn't exposed to the compiler.