google / zerocopy

https://discord.gg/MAvWH2R6zk
Apache License 2.0
1.56k stars 99 forks source link

`Ref<core::cell::Ref<[u8]>, _>::into_ref` is unsound #716

Closed joshlf closed 10 months ago

joshlf commented 10 months ago

For brevity, this issue will refer to core::cell::Ref as CoreRef and core::cell::RefMut as CoreRefMut.

Overview

Ref<B, T>'s into_ref, into_mut, into_slice, and into_mut_slice methods are unsound when B is CoreRef<[u8]> or CoreRefMut<[u8]>.

Progress

Proof of concept

The following (100% safe) code exhibits undefined behavior according to Miri (playground link):

use core::cell::{Ref, RefCell};

fn main() {
    let refcell = RefCell::new([0u8, 1, 2, 3]);
    let core_ref = refcell.borrow();
    let core_ref = Ref::map(core_ref, |bytes| &bytes[..]);

    // `zc_ref` now stores `core_ref` internally.
    let zc_ref = zerocopy::Ref::<_, u32>::new(core_ref).unwrap();

    // This causes `core_ref` to get dropped and synthesizes a Rust
    // reference to the memory `core_ref` was pointing at.
    let rust_ref = zc_ref.into_ref();

    // UB!!! This mutates `rust_ref`'s referent while it's alive.
    *refcell.borrow_mut() = [0, 0, 0, 0];

    println!("{}", rust_ref);
}

Miri output:

error: Undefined Behavior: trying to retag from <1984> for SharedReadOnly permission at alloc903[0x8], but that tag does not exist in the borrow stack for this location
    --> /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:2304:1
     |
2304 | fmt_refs! { Debug, Display, Octal, Binary, LowerHex, UpperHex, LowerExp, UpperExp }
     | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     | |
     | trying to retag from <1984> for SharedReadOnly permission at alloc903[0x8], but that tag does not exist in the borrow stack for this location
     | this error occurs as part of retag at alloc903[0x8..0xc]
     |
     = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
     = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <1984> was created by a SharedReadOnly retag at offsets [0x8..0xc]
    --> src/main.rs:13:20
     |
13   |     let rust_ref = zc_ref.into_ref();
     |                    ^^^^^^^^^^^^^^^^^
help: <1984> was later invalidated at offsets [0x8..0xc] by a Unique retag
    --> src/main.rs:16:5
     |
16   |     *refcell.borrow_mut() = [0, 0, 0, 0];
     |     ^^^^^^^^^^^^^^^^^^^^^
     = note: BACKTRACE (of the first span):
     = note: inside `<&u32 as std::fmt::Display>::fmt` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:2294:71: 2294:78
     = note: inside `core::fmt::rt::Argument::<'_>::fmt` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/rt.rs:142:9: 142:40
     = note: inside `std::fmt::write` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:1120:17: 1120:40
     = note: inside `<std::io::StdoutLock<'_> as std::io::Write>::write_fmt` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/mod.rs:1810:15: 1810:43
     = note: inside `<&std::io::Stdout as std::io::Write>::write_fmt` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/stdio.rs:727:9: 727:36
     = note: inside `<std::io::Stdout as std::io::Write>::write_fmt` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/stdio.rs:701:9: 701:33
     = note: inside `std::io::stdio::print_to::<std::io::Stdout>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/stdio.rs:1020:21: 1020:47
     = note: inside `std::io::_print` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/stdio.rs:1097:5: 1097:37
note: inside `main`
    --> src/main.rs:18:5
     |
18   |     println!("{}", rust_ref);
     |     ^^^^^^^^^^^^^^^^^^^^^^^^
     = note: this error originates in the macro `fmt_refs` which comes from the expansion of the macro `println` (in Nightly builds, run with -Z macro-backtrace for more info)

The following playground examples demonstrate the same UB using the other types and methods:

Explanation

This explanation is in terms of CoreRef and the Ref::into_ref method, but the same explanation holds of CoreRefMut and Ref::into_mut, Ref::into_slice, and Ref::into_mut_slice.

The relevant bits of the code are:

https://github.com/google/zerocopy/blob/a8572dafd9a0a5f5f583ab4c16e62dab9b664b15/src/lib.rs#L4787-L4805

https://github.com/google/zerocopy/blob/a8572dafd9a0a5f5f583ab4c16e62dab9b664b15/src/lib.rs#L4924-L4947

https://github.com/google/zerocopy/blob/a8572dafd9a0a5f5f583ab4c16e62dab9b664b15/src/lib.rs#L5332-L5339

Quick explanation

We'll start with a concise explanation which will paper over the unsoundness, and hopefully help to demonstrate how a bug like this could have made its way in.

Ref::deref_helper is unsafe, and allows the caller to provide an arbitrary lifetime, 'a, which is not necessarily the same as the lifetime of its &self parameter. It returns a reference, &'a T, of that arbitrary lifetime. In exchange, it requires the caller to guarantee that "the lifetime 'a is valid for this reference."

Ref::into_ref is only available when the buffer type, B, is valid for 'a (ie, B: 'a). Thus, it can soundly call deref_helper internally - the B: 'a bound ensures that the returned reference won't outlive the buffer.

Bug

The bug comes from the fact that B: 'a is an insufficient guarantee. In particular, it guarantees that the buffer type satisfies 'a, but what that means differs by type. When I authored this code, I was probably implicitly thinking in terms of &'a [u8], where the 'a means that the underlying memory is considered immutably borrowed for the lifetime 'a - it can neither be mutated nor dropped during 'a.

Unfortunately, CoreRef<[u8]>: 'a only requires that the underlying RefCell lives for at least 'a. [1] The CoreRef itself may be dropped sooner than this and still satisfy CoreRef<[u8]>: 'a. Of course, if we think specifically in terms of RefCell, it makes sense that this has to be the behavior - RefCell's whole purpose is to allow mutation which is tracked dynamically at runtime rather than statically at compile time. By dropping the CoreRef, we're decrementing the RefCell's refcount and allowing the underlying memory to be mutated again.

Another perspective on why this is problematic is to trace the function call stack and figure out why this code compiles at all. We start with Ref::into_ref, which takes self by value. As a consequence, it drops self at the end of the method scope. We might reasonably ask how this is able to compile given that the returned reference outlives that scope.

As we already saw, Ref::deref_helper takes a &self of any lifetime and returns a &'a T of the caller-chosen lifetime 'a. Internally, it accomplishes this by calling <B as ByteSlice>::as_ptr. as_ptr is defined like so:

https://github.com/google/zerocopy/blob/a8572dafd9a0a5f5f583ab4c16e62dab9b664b15/src/lib.rs#L5332-L5339

This involves an implicit call to Deref::deref, so we can desugar like so:

fn as_ptr(&self) -> *const u8 {
    let bytes: &[u8] = self.deref();
    <[u8]>::as_ptr(bytes)
}

Note that Deref<Target = [u8]>::deref has the signature fn deref<'a>(&'a self) -> &'a [u8], and so the returned bytes has the same lifetime as the self argument to as_ptr. Bubbling up the call stack, that means that it has the same lifetime as self in Ref::into_ref, which is only valid until the end of into_ref's method scope.

[1] In particular, the RefCell<T> method borrow has the signature fn borrow<'a>(&'a self) -> CoreRef<'a, T> - the 'a serves to ensure that the returned CoreRef doesn't outlive the underlying RefCell.

History and affected versions

This commit from December, 2019 is as far back as our Git history goes, when zerocopy's code was migrated from elsewhere in Fuchsia's codebase (this was before zerocopy had been moved to GitHub, when it still lived in Fuchia's source tree).

From a manually search of docs.rs, it seems that the earliest affected version is 0.2.2. Ref::into_ref exists here in 0.2.2, but does not exist in 0.2.1.

Impact

It is difficult to estimate the usage of these methods in the wild, but in my personal experience, I've never seen anyone use the Ref<B, T> type with B = CoreRef<[u8]> or B = CoreRefMut<[u8]>. An affected user would need to use this setup and also call the into_ref, into_mut, into_slice, or into_mut_slice methods.

Mitigation plan

Thankfully, our Git history goes back far enough that we have source code for every affected minor version (ie, 0.2.X and onwards). Looking at the history up until the release of 0.3.0, we can see that the most recent commit we have from 0.2.X is 19ac2a7dc8821368e63578ee14bc36900aa6e6ce, and this is confirmed by the Cargo.toml.crates-io file. Thus, we should be able to backport a fix to every affected minor version. Depending on how the discussion in https://github.com/google/zerocopy/issues/679 concludes, we may also decide to yank some or all affected versions after publishing a fix to each affected minor version.

Fix

TL;DR: We will add a check that causes code which exercises this UB to fail compilation after monomorphization.

What fix we implement on existing versions is significantly more constrained. No matter what we do, any fix must technically be a breaking change, as code which could previously compile and exhibit UB at runtime should either no longer compile or should panic at runtime.

We considered a few options, and evaluated them across a few axes:

We considered these options:

Only the last option satisfies all our criteria, and so we will implement it. For completeness, here is each option evaluated against each criterion:

Option into_xxx compile when B is generic? CoreRef/CoreRefMut compile when not using into_xxx? Code which would otherwise be UB fails at compile time?
Extra trait bound for B
Remove ByteSlice impl from CoreRef/CoreRefMut
Panic at runtime
Fail compilation post-monomorphization
ijackson commented 10 months ago

How exciting. (I came here via #679, and I have not analysed the unsoundness in detail. Taking what is written here as true:)

I would recommend filing a RUSTSEC advisory and not yanking affected versions.

Possibly telling you things you already know

Also you should probably do an underlying process root cause analysis. The technical analysis above is a concrete explanation of the programming mistake. But programming mistakes are inevitable because we are human. When mistakes can lead to significant problems, it is usually best to have safeguards (procedural, or engineered) that try to catch or mitigate those errors.

Your goals presumably include avoid deploying vulnerable code; with the choice of Rust as a language, that refines to a sub-goal of not deploying unsound code. Deployment of this mistake represents a failure of your process for preventing unsoundness, and could be seen as a near-miss for deployment of a vulnerability. (I don't feel qualified from reading the above to say how near a miss it is.)

Or to put it another way, you might want to reconsider your approach towards assurance of memory-unsafe code including unsafe Rust.

Ways you might prevent deployment of mistakes might include machine verification or checking, elimination of unneeded unsafe APIs, adoption of defensive coding styles, semiformal coding practices or proofs, or (weakest) informally-structured manual human review or audit.

joshlf commented 10 months ago

All affected version trains (including 0.8.0-alpha) have now been fixed and all affected versions have been yanked. We still want to follow up with a more ergonomic and natural API (https://github.com/google/zerocopy/issues/758), but that can be considered follow-up work. Thus, I'm considering this done and closing this issue.