rust-lang / const-eval

home for proposals in and around compile-time function evaluation
Apache License 2.0
103 stars 17 forks source link

Pattern-matching related soundness concerns #42

Closed RalfJung closed 11 months ago

RalfJung commented 4 years ago

Recently it was pointed out to me that there is an entire category of const-related soundness concerns that I was unaware of, and that is not mentioned with a single word in this repository: pattern matching. I think for the most part, const soundness and "const FOO = expr is equivalent to inlining expr" suffice to ensure pattern-matching-soundness, but there is one exception: const-soundness as described and discussed so far stops at references, but when pattern-matching on consts, the content behind the reference can matter.

For example, the following is completely fine as far as "inlining-soundness" is concerned, but it is pattern-unsound (and let's ignore aliasing issues for now):

static mut BYTES: [u8; 16] = *b"1234567890abcdef";
const BYTES_REF: &[u8] = unsafe { &BYTES };

fn main() { unsafe {
  BYTES[0] = 2;
  assert!(matches!(b"1234567890abcdef", BYTES_REF));
} }

Inlining semantics are entirely preserved when a const just points to mutable data, but the same becomes a big problem with pattern matching.

Some things I think we should do:

RalfJung commented 4 years ago

Get an exhaustive list of the kind of consts involving references that we can pattern-match on. Is it just &str and &[u8] (and we could get away with special hacks targeting them) or is there more?

@oli-obk @eddyb would be nice if you could fill me in here. :)

oli-obk commented 4 years ago

We can pattern match on all kind of references. It's just that right now only &str and &[u8] (and maybe &[u8; N]) are actually used for exhaustiveness. This means that at runtime, we fall back to == checking, treating the whole constant as opaque input, but you can still pattern match on a constant of type &&&Foo for any Foo that you can normally pattern match on.

RalfJung commented 4 years ago

Okay, wow, that's a big one. I am so glad we caught this before landing https://github.com/rust-lang/rust/pull/71332. ^^

It's just that right now only &str and &[u8] (and maybe &[u8; N]) are actually used for exhaustiveness.

How can &str and &[u8] ever be exhaustive without a catch-all clause (_ pattern)?

RalfJung commented 4 years ago

Ah, this compiles:

#![feature(exclusive_range_pattern)]
#![feature(half_open_range_patterns)]
const PAT: &[u8] = &[0];

pub fn test(x: &[u8]) -> bool {
    match x {
        PAT => true,
        &[] => false,
        &[1..] => false,
        &[_, _, ..] => false
    }
}
comex commented 4 years ago

Hmm... As far as I can tell, the only way to mutate data within a constant without aliasing-UB is with interior mutability. But that requires UnsafeCell, and you can't pattern match on UnsafeCell from outside libcore. So maybe this is sound?

joshtriplett commented 4 years ago

I feel like I'm missing some aspect of const semantics, but this seems like something that we should be able to catch at compile time by noticing that we've taken a reference to the mutable value, so it shouldn't be directly accessible for mutation anymore because that reference has it borrowed.

rpjohnst commented 4 years ago

@joshtriplett The reference in the const does not exist "ambiently" the way the static does. It only exists at the points where the const is used- in one sense it is re-created each time it is evaluated. If consts were actually evaluated at runtime, this would be fine, but they are instead evaluated at compile time.

This makes the rules a bit more complex- their goal is to ensure that the result of evaluating the constant would be identical every time even if it happened at runtime. So references to mutable values need to be prevented in the definitions of consts themselves, rather than the more usual borrowck behavior of preventing mutations while those references are live.

RalfJung commented 4 years ago

https://github.com/rust-lang/rust/pull/71655 landed and documents the invariants a bit better that make sure this is sound. I think I finally understood why @oli-obk kept insisting on all these extra checks against consts pointing to statics.

So I think right now, everything is sound. However, it relies on a rather subtle global invariant that no pointer to a static (including pointers to "things inside statics") ever leaks to a non-static. I'd be more happy if we could check consts-dont-point-to-mutable more directly than via an invariant that encompasses all of tcx.alloc_map.

And the other problem is that we have a fundamental conflict here: consts pointing to interior mutable statics would, in principle, pose no problems for const-soundness and aliasing. However, that conflicts with using consts in patterns. It seems like we can have one or the other feature with consts, but not both -- and, worse, we already decided in favor of patterns, without (as far as I can tell) being aware that this closes the door on other const possibilities.

comex commented 4 years ago

Okay, after being confused for a while, I think I understand the situation better.

The example in the first post is not UB due to aliasing, assuming the const is inlined at the point of use, since there are no conflicting references to BYTES at the point BYTES_REF is used.

Also, contrary to my last comment, you can get the same effect with interior mutability:

use std::cell::UnsafeCell;
static BYTES: UnsafeCell<[u8; 16]> = UnsafeCell::new(*b"1234567890abcdef");
const BYTES_REF: &[u8] = unsafe { &*BYTES.get() };

fn main() { unsafe {
  (*BYTES.get())[0] = 2;
  assert!(matches!(b"1234567890abcdef" as &[u8], BYTES_REF));
} }

From a user's perspective, it seems like the simplest mental model is: the pointer value of BYTES_REF is a constant expression, but the result of dereferencing it is not a constant expression. Therefore, either you can't match against it, or such a match would never be treated as exhaustive. Similarly, you wouldn't be able to dereference it within const-eval. But you could still create it.

Apparently things get more complex with const generics because they currently treat all references that point to the same data as equivalent. I will post in that thread as well.

RalfJung commented 4 years ago

The example in the first post is not UB due to aliasing, assuming the const is inlined at the point of use, since there are no conflicting references to BYTES at the point BYTES_REF is used.

Well... I'd just say the aliasing model doesn't really comment on this example yet.

But as you said, the same effect can be achieved with interior mutabiltiy -- static mut just makes it a bit easier. That's why I said we can ignore the aliasing rules.

From a user's perspective, it seems like the simplest mental model is: the pointer value of BYTES_REF is a constant expression, but the result of dereferencing it is not a constant expression.

That's a good way to put it.

Therefore, either you can't match against it, or such a match would never be treated as exhaustive.

Probably we just shouldn't permit to match on it (assuming we ever allow such constants at all), everything else seems really surprising.

RalfJung commented 4 years ago

I opened https://github.com/rust-lang/rust/issues/74446 to better figure out our story around which types are allowed for consts in patterns. From what I understand, the soundness part of this is mostly about which of those take part in exhaustiveness checking.

@oli-obk is this still true?

It's just that right now only &str and &[u8] (and maybe &[u8; N]) are actually used for exhaustiveness.

Also, are we all in agreement that for no we should not expand the set of types that are considered for exhaustiveness, to avoid introducing further problems, until we figured out the whole match story a bit better?

Though given that we have some cases where exhaustiveness is based on const values, maybe it doesn't make much of a difference to add more of those cases...

oli-obk commented 4 years ago

@oli-obk is this still true?

My statement was meant for constants of reference type. All other constants have been doing exhaustive checking since quite some time.

RalfJung commented 4 years ago

Quoting you from another issue:

integers do exhaustiveness checks. so do structs without private fields and enums.

RalfJung commented 11 months ago

I think things are a lot better here now than they used to be, due to valtrees. But I don't know the exact rules according to which we currently determine exhaustiveness, so let's keep this issue open until we got those documented and sanity-checked.

RalfJung commented 11 months ago

All computing of information for the exhaustiveness check goes through valtrees, and all reads that are performed during valtree construction go through the const-eval machine configured such that reads from anything mutable are rejected. So at runtime we'll never see values that are different from what exhaustiveness checking saw. Also I think we don't even access that memory, we are making a copy for our valtree anyway.

So we can close this issue. :)