rapid7 / metasploit-framework

Metasploit Framework
https://www.metasploit.com/
Other
33.8k stars 13.9k forks source link

Add CVE-2022-1043 #17200

Closed jvoisin closed 1 year ago

jvoisin commented 1 year ago

@minipli-oss wrote a neat exploit for CVE-2022-1043, and it has the following advantages over your everyday Linux LPE:

The only drawback is that it only works on v5.12-rc3 to v5.14-rc7.

solardiz commented 1 year ago

Hi @jvoisin. What is your mention that this bypasses LKRG based on - e.g., your own testing, your own analysis, some third-party reference? We (LKRG developers) couldn't easily get the exploit to work so far (unrelated to LKRG), and by our understanding it might very well be caught by LKRG if it (almost) worked. Thanks. CC: @Adam-pi3

minipli-oss commented 1 year ago

Yeah, the exploit is really just a PoC, as it exploits a race and thereby can hit a still free/poisoned object and trigger page faults.

You might want to give the new PoC for CVE-2022-22942 a shot. It exploits a similar bug (same-type, same-addr uaf), just using a different type (struct file instead of struct cred). It also doesn't depend on privileged helper processes. However, make sure to take a snapshot of the VM before, as the PoC will try to overwrite the suid binary it is targeting -- /bin/chfn by default. Also call sync prior to running the freshly compiled cve-2022-22942-dc.

One drawback, the PoC needs a VMware VM because it attacks the vmwgfx driver.

minipli-oss commented 1 year ago

And yes, just confirmed LKRG doesn't detect it: Bildschirmfoto vom 2022-11-03 21-16-35

solardiz commented 1 year ago

@minipli-oss Cool, thanks! I'm not surprised LKRG doesn't detect struct file reuse - I'd be surprised if it did. However, for struct cred it could be different, because LKRG maintains its own shadow credentials that it checks live ones against.

The way the CVE-2022-1043 exploit didn't work for us isn't as you describe - rather, it would just sit on the "waiting" step for ages - on my test, until the tiny VM ran out of free disk space for the logs.

[~] forking helper process...
[~] creating worker threads...
[~] ID wrapped after 65536 allocation attempts! (id = 1)
[~] ID wrapped again after 131071 allocation attempts! (id = 1)
[~] waiting for creds to get reallocated...

That was with Ubuntu's 5.13.0-44-generic #49~20.04.1-Ubuntu SMP Wed May 18 18:44:28 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux. I had to remove the CPU pinning since this VM had only one vCPU, but @Adam-pi3 says he observed similar behavior in a VM with 4 vCPUs.

minipli-oss commented 1 year ago

Well, the CPU pinning is needed, for reasons mentioned in the comment right in front of it:

    /* Switch CPUs to not trip the checks in __put_cred() about destroying our
     * own creds via the RCU worker.
     */

So removing the CPU pinning and running the PoC on a single CPU will likely lead to a kernel panic.

But the real reason why you can't trigger the bug is that your kernel isn't vulnerable. The bug was fixed in v5.13.13 with commit a57b2a703e44. You seem to be running some v5.13.19ish version.

Adam-pi3 commented 1 year ago

@minipli-oss would you be able to share what was exact kernel which you were testing against CVE-2022-1043 (https://github.com/opensrcsec/same_type_object_reuse_exploits/blob/main/cve-2022-1043.c) exploit? As @solardiz mentioned we had some troubles with reproducing it. That being said, I was able to generate NULL deref under the following ubuntu kernel: https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.14-rc6/

However, if you already set-up working environment, would you be able to test CVE-2022-1043 (https://github.com/opensrcsec/same_type_object_reuse_exploits/blob/main/cve-2022-1043.c) exploit against LKRG?

minipli-oss commented 1 year ago

As mentioned in the blog, I was testing the PoC for CVE-2022-1043 on a modified grsecurity kernel with the commit fixing the bug reverted. I didn't found a distro kernel I could use for demonstration purposes, so I used my own. You can probably do the same?

solardiz commented 1 year ago

@minipli-oss Yes, I fully expected to run into that issue mentioned in the comment when I removed the CPU pinning, but I did not. And I finally see what I got wrong - somehow I thought my kernel build was from May and the fix from August, but now I see it's May 2022 vs. August 2021. I wrongly thought the fix was from August 2022, not 2021 - I guess I didn't look at the year on the commit being too sure that a 2022 CVE means a 2022 fix. That was wrong of me (and maybe a 2021 CVE should have been allocated, but that's separate). Thanks.

solardiz commented 1 year ago

@minipli-oss BTW, you can possibly load LKRG on top of that "modified grsecurity kernel with the commit fixing the bug reverted." We did try supporting LKRG on top of grsecurity at some point, but we did not retest that for a long time - especially as we have no access to non-public grsecurity kernels.

minipli-oss commented 1 year ago

I spent some time today to adapt the PoC for Ubuntu. User namespaces and AppArmor threw a wrench or two into the play, but I figured how to work around these and pushed the result just yet: https://github.com/opensrcsec/same_type_object_reuse_exploits/commit/6c0deebe30bef9ee67877b742dbdd1e760599778. Feel free to reproduce!

I used Ubuntu 20.10 with that kernel: https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.13.12/amd64/

Getting LKRG to compile against this kernel was a challenge, too. I needed to rebuild various binaries, e.g. scripts/basic/fixdep or scripts/mod/modpost, and had to install an old libssl as descried here. But, eventually, I was able to build and load the module.

Below is the result of running the adapted PoC with lkrg.ko loaded:

Bildschirmfoto vom 2022-11-04 16-21-53

Looks like the alternative probe method using prctl() gets hooked by LKRG and the checks in there kill the box -- it seems to be live-locked with one CPU consuming 100%. But the backtrace makes little sense to me, as I don't see a direct call to [__]put_cred() in p_validate_task_f(). I leave it to you to figure out what actually happens here.

I might be able to adapt the PoC to work without the prctl() based probe as well but I don't feel like spending more time on this ;)

Re: the CVE year number, yes, I was thinking the same, that it should be from 2021. But the bug was fixed rather silently at the time, so I'm happy it has a CVE at least!

solardiz commented 1 year ago

Thank you very much, @minipli-oss!

Regarding those difficulties getting LKRG to build, am I correct to assume they applied to building an out-of-tree module in general, and not LKRG specifically? LKRG just builds on typical systems like Ubuntu for us.

There is explicit put_cred inside p_cmp_tasks, which is called from p_validate_task_f:

   p_current_cred = rcu_dereference(p_current->cred);
   /* Get reference to cred */
   get_cred(p_current_cred);
   p_current_real_cred = rcu_dereference(p_current->real_cred);
   /* Get reference to real_cred */
   get_cred(p_current_real_cred);

and then:

   /* Release reference to cred */
   put_cred(p_current_cred);
   /* Release reference to real_cred */
   put_cred(p_current_real_cred);

My guess is some race resulted in inconsistency here. Also, the "invalid opcode" error is weird.

bspengler-oss commented 1 year ago

"invalid opcode" is expected, it's part of the standard BUG() message which is implemented as a ud2 instruction.

solardiz commented 1 year ago

Thanks, @bspengler-oss! You're right indeed, and there's the 0f 0b right in there.

solardiz commented 1 year ago

What happens on put_cred there is probably that it thinks it's dropping the final reference to current->cred, and that is quite reasonably not allowed:

static inline void put_cred(const struct cred *_cred)
{
        struct cred *cred = (struct cred *) _cred;

        if (cred) {
                validate_creds(cred);
                if (atomic_dec_and_test(&(cred)->usage))
                        __put_cred(cred);
        }
}
/**
 * __put_cred - Destroy a set of credentials
 * @cred: The record to release
 *
 * Destroy a set of credentials on which no references remain.
 */
void __put_cred(struct cred *cred)
{
        kdebug("__put_cred(%p{%d,%d})", cred,
               atomic_read(&cred->usage),
               read_cred_subscribers(cred));

        BUG_ON(atomic_read(&cred->usage) != 0);
#ifdef CONFIG_DEBUG_CREDENTIALS
        BUG_ON(read_cred_subscribers(cred) != 0);
        cred->magic = CRED_MAGIC_DEAD;
        cred->put_addr = __builtin_return_address(0);
#endif
        BUG_ON(cred == current->cred);
        BUG_ON(cred == current->real_cred);

        if (cred->non_rcu)
                put_cred_rcu(&cred->rcu);
        else
                call_rcu(&cred->rcu, put_cred_rcu);
}

Why final? Perhaps because the vulnerability was triggered. It's probably similar to what @minipli-oss referred to above:

    /* Switch CPUs to not trip the checks in __put_cred() about destroying our
     * own creds via the RCU worker.
     */

except that with LKRG's extra code (and extra usage of put_cred) those checks are tripped even with the CPU pinning? Anyway, I guess even if they were not, LKRG would have detected wrong credentials right in that same function.

minipli-oss commented 1 year ago

Yeah, it's because current->cred is a dangling pointer and the get/put_cred() in LKRG make it trigger sanity checks in kernel/cred.c, the PoC otherwise carefully tries to avoid (no setuid(), no fork(),...).

The BUG_ON() actually triggers for the put_cred(p_current_real_cred) in p_cmp_tasks(), i.e. after all checks have passed, apparently, as no LKRG related log message was generated. So I assume, LKRG only catches it by chance, not by intend?

Would the dangling cred pointer already point to a reallocated cred object, I assume, LKRG wouldn't detect it either, nor would the sanity checks in kernel/cred.c trigger. But that can probably only be confirmed by weaponizing the PoC even further. And that's what I strongly object to.

minipli-oss commented 1 year ago

re: build issues for LKRG, yes, that was strictly related to me trying to run Ubuntu 20.10 with a kernel build on a much more recent version, using a different version of libc, making all the tools in the kernel-headers package fail to run and needing recompilation. After these had been resolved, LKRG built just fine.

solardiz commented 1 year ago

The BUG_ON() actually triggers for the put_cred(p_current_real_cred) in p_cmp_tasks(), i.e. after all checks have passed, apparently, as no LKRG related log message was generated. So I assume, LKRG only catches it by chance, not by intend?

That's a good point. I'd have expected some log messages to have been generated. That they were not can mean several things, including (1) that the race was triggered closer to the end of the function (then maybe it'd have been detected more explicitly on the next call) or/and (2) that we have a logic bug/shortcoming preventing detection of that condition or/and (3) that the credentials are not yet changed (we don't alert on a dangling pointer alone nor on a wrong pointer alone, but only when the pointed-to uid/gid/namespace are not what's expected). Specifically:

   if (p_orig->p_ed_task.p_cred_ptr != p_current_cred) {
      if (p_cmp_creds(&p_orig->p_ed_task.p_cred, p_current_cred, p_current, 0x0)) {
         P_CMP_PTR(p_orig->p_ed_task.p_cred_ptr, p_current_cred, "cred")
      }
   }

   if (p_orig->p_ed_task.p_real_cred_ptr != p_current_real_cred) {
      if (p_cmp_creds(&p_orig->p_ed_task.p_real_cred, p_current_real_cred, p_current, 0x0)) {
         P_CMP_PTR(p_orig->p_ed_task.p_real_cred_ptr, p_current_real_cred, "real_cred")
      }
   }

This compares the pointers, but the p_cmp_creds check mutes the mismatch when the credentials themselves match.

   p_ret += p_cmp_creds(&p_orig->p_ed_task.p_cred, p_current_cred, p_current, 0x1);
   if (p_ret)
      p_ret += p_cmp_creds(&p_orig->p_ed_task.p_real_cred, p_current_real_cred, p_current, 0x1);

The p_cmp_creds call for real_cred is skipped when there isn't yet any discrepancy detected by that point - I don't know why we skip it, perhaps @Adam-pi3 does. We'll need to revisit this.

h00die commented 1 year ago

Created an ubuntu 22.04 VM, installed 5.13.12. Single CPU system running on ESXi

CVE-2022-1043

ubuntu@ubuntu2204:~$ ./cve-2022-1043
[~] forking helper process...
[~] creating worker threads...
[~] ID wrapped after 65536 allocation attempts! (id = 1)
[~] ID wrapped again after 131071 allocation attempts! (id = 1)
[!] do_trigger: failed to pin to CPU #1: Invalid argument
[-] not allowed to die, zombie time!
^C

failed. Any ideas why?

CVE-2022-22942

ubuntu@ubuntu2204:~$ gcc -O2 cve-2022-22942-dc.c -o cve-2022-22942-dc
cve-2022-22942-dc.c: In function ‘main’:
cve-2022-22942-dc.c:514:17: warning: ignoring return value of ‘setuid’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
  514 |                 setuid(0);
      |                 ^~~~~~~~~
cve-2022-22942-dc.c:515:17: warning: ignoring return value of ‘setgid’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
  515 |                 setgid(0);
      |                 ^~~~~~~~~
ubuntu@ubuntu2204:~$ ./cve-2022-22942-dc
[~] creating r/o mapping of /bin/chfn...
[~] creating r/o mapping of /proc/self/exe...
[~] spawning helper processes...
[~] vmwgfx setup using /dev/dri/card0...
[+] confirmed to be targeting the right driver
[~] predicted fence fd = 5
[~] triggering fence fd export...
[~] RCU GP passed and file object released -- by now or soon!
[~] opening some r/w files for /var/tmp/cake
[~] probing stale fd for a match...
[+] found match at fd 3
[~] creating r/w mapping...
[~] closing stale fd...
[~] RCU GP passed and file object released again -- hopefully!
[~] opening some r/o files for /bin/chfn
[*] trying to overwrite code in /bin/chfn
[~] mmap_worker: done
[~] stale_fd_worker: done
[$] success, spawning shell...
# id
uid=0(root) gid=0(root) groups=0(root),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),110(lxd),1000(ubuntu)

success. I'll look at adding this to framework.

solardiz commented 1 year ago

Created an ubuntu 22.04 VM, installed 5.13.12. Single CPU system running on ESXi

CVE-2022-1043

@h00die You need at least two vCPUs in the VM for this exploit to work as-is.

h00die commented 1 year ago

@minipli-oss any idea when https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-22942 may get filled in?

@minipli-oss CVE-2022-22942 is a vmwgfx bug, I have version 2.18.1.0 loaded on my exploitable box. Is there a version series that are vulnerable, or is it more tied to the kernel versions. If it's a kernel version (like it looks on https://ubuntu.com/security/CVE-2022-22942) do you have a summarized list of vuln kernels?

minipli-oss commented 1 year ago

@minipli-oss any idea when https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-22942 may get filled in?

I've no idea, sorry. But using individual vendor sites may provide some insight (as you already did).

I gave some background information in the related oss-security posting and mentioned kernel versions including the fix in a follow-up post.

@minipli-oss CVE-2022-22942 is a vmwgfx bug, I have version 2.18.1.0 loaded on my exploitable box. Is there a version series that are vulnerable, or is it more tied to the kernel versions. If it's a kernel version (like it looks on https://ubuntu.com/security/CVE-2022-22942) do you have a summarized list of vuln kernels?

The kernel versions affected are listed in the source:

 * This bug was fixed by commit a0f90c881570 ("drm/vmwgfx: Fix stale file
 * descriptors on failed usercopy"). It affected kernel versions v4.14-rc1 to
 * v5.17-rc1.

This combined with the information from the oss-security posting should give you a list of affected kernel versions.