rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.84k stars 12.77k forks source link

Const qualification #53819

Open oli-obk opened 6 years ago

oli-obk commented 6 years ago

cc @eddyb @RalfJung

Let's figure out what exactly we need, because I'm very confused about it.

So, currently const qualification is doing a few things at once:

  1. figure out all the feature gates and forbid things not allowed in constants/statics/const_fn at all or without the feature gates
  2. figure out what things are promotable by ruling out values containing Drop or UnsafeCell types (None is always ok, even if Some(value) would not be due to the type, even if Option<T> would technically propagate said type.
  3. In order to guarantee this for promotion we also check the bodies of const fn and const for Drop and UnsafeCell value creations, even if it is ignored for the constant itself and only checked when the constant is used in a value checked for promotion

Why I want to stop looking at bodies, and instead just check the final value of constants:

Why we cannot just look at the final value right now:

Solution brainstorm:

  1. don't promote calls to const fn if its return type may contain UnsafeCell or Drop. So Option::<String> is not promoted, even if the actual value is None. (not a breaking change, since there are no stable const fn for which this could break any code)
  2. Always assume the worst with associated constants (already the case https://play.rust-lang.org/?gist=36546b7a589178413e28ba09f1cd0201&version=stable&mode=debug&edition=2015 ) So we don't promote associated constant uses unless monomorphized.
oli-obk commented 6 years ago

Ok, just had a quick conversation with @eddyb on discord.

We also need to prevent

Basically anything other than checking the body of a constant won't be useful for polymorphic code. So we'd still not be able to remove the qualification as it exists today.

To make the qualification smarter and generally be able to understand conditional code and loops and such we need to implement the qualification in terms of dataflow

RalfJung commented 6 years ago

We rule out UnsafeCell values because the same promoted will be used multiple times when this code is executed several times, and of course that's observable once mutation enters the picture.

We rule out Drop values because... it's not okay to not run the destructor, because the user didn't actually ask for a static?

RalfJung commented 6 years ago

To make the qualification smarter and generally be able to understand conditional code and loops and such we need to implement the qualification in terms of dataflow

This only applies to promotion, right? I'd not do dataflow. I'd just say we never promote once control flow is involved.

From what I understand, we still only look at the type when checking if a use of a const fn can be promoted, and the resulting value for const.

oli-obk commented 6 years ago

so const FOO: usize = if false { 42 } else { 45 }; won't be usable in let x: &'static usize = &FOO; ?

RalfJung commented 6 years ago

Oh are you saying that we have trouble with promotion even when we are just using a const? Dang...

For const however a value-based analysis seems reasonable, does it not? The value should be the only thing of a const that is actually observable.

How to determine drop/interior mutability of a value is another question. We could traverse the value guided by the type (exactly like the sanity check does) and look for it? This has to stop at unions, but we do not promote code that reads from unions so I think we are good.

oli-obk commented 6 years ago

If the constant is an associated constant we're out of luck wrt promotion checking the value. But I think we currently just pessimistically assume drop and not_freeze, so it could be fine.

RalfJung commented 6 years ago

True, the best we can do at that point is fall back to its type (insofar that is known).

But I see no reason, so far, to ever look at the body of a constant (as opposed to its computed value).

eddyb commented 6 years ago

More recently, I came to the conclusion that the code is spaghetti because it tried really hard to avoid recomputing things, so it ends up with one "qualif" that's modified in a linear pass.

If rewritten into several modular analyses, and one const checker (the only part that would actually emit errors), we can probably have a cleaner setup overall. And we could even mix polymorphic MIR analysis with post-eval value analysis.

E.g. (UnsafeCell::new(42), 42).1 is treated as if you end up with an UnsafeCell value

That shouldn't be true, I don't think. Only unions have problems like these, structs should be fine.

RalfJung commented 6 years ago

That shouldn't be true, I don't think. Only unions have problems like these, structs should be fine.

The MIR const qualifier does type-based reasoning on projections of pairs, but the old HIR one not.

abonander commented 6 years ago

@oli-obk Unions are currently feature-gated in const fns but are stable in other const contexts, was this intended?

union SignedToUnsigned {
    signed: i32,
    unsigned: u32,
}

// allowed on stable
const UNSIGNED: u32 = unsafe { SignedToUnsigned { signed: 1i32 }.unsigned };

// requires `#![feature(const_fn_union)]`
const fn signedToUnsigned(signed: i32) -> u32 {
    unsafe { SignedToUnsigned { signed }.unsigned }
}
RalfJung commented 6 years ago

@abonander yes. It is too late to gate them in const. And they also do less harm there because we can do sanity checks on const to ensure they match their given type, which is not possible for const fn.

abonander commented 6 years ago

@RalfJung can you elaborate on that last part? Specifically, anything that would preclude something like #54678 where both types in the union are the same size?

RalfJung commented 6 years ago

For const we have a "sanity check" that looks at the final computed value and its type, and makes sure the value is reasonable at the given type. So even if people do horrible things with unions, we can usually tell them that that might be a bad idea. However, for const fn, we'd have to do that check for ever possible argument, which is clearly not possible.

Also see https://github.com/rust-lang/rust/issues/54709

oli-obk commented 6 years ago

So I tried out a few things and looked at the corresponding qualification code. We are maximally pessimistic about associated constants if their type could possibly allow Drop or UnsafeCell. This means that we assume the type allows both of them unless proven otherwise.

E.g.

trait Foo<T> {
    const X: T;
}

trait Bar<T, U: Foo<T>> {
    const F: u32 = (U::X, 42).1;
                   ^^^^^^^^^^ destructors in constants are not allowed
}

does not compile, since we assume that the tuple has nontrivial drop glue, because U::X might have such drop glue.

A value based analysis could not evaluate F, because that would still be generic. Thus we'd not be able to report an error where we are currently reporting one. This would mean that the moment someone monomorphizes this trait, the constant would get evaluated and we'd possibly get an error then (depending on whether the type "needs drop" or not). For example:

fn foo<T, U, V: Bar<T, U>>(t: T, u: U, v: V) { ... }

would error if the concrete type given for T needs drop.

I hope it makes sense now why we can only do a value based analysis in case there are no generics involved

oli-obk commented 5 years ago

While monomorphization time errors are not great, we already have a bunch of them from const eval failures:

trait Foo<T> {
    const X: T;
}

trait Bar<U: Foo<usize>> {
    const F: u32 = [42][U::X];
}

struct Struct;

impl Foo<usize> for Struct {
    const X: usize = 99;
}

impl Bar<Struct> for Struct {}

fn main() {
    // comment out to not get any errors
    let x = Struct::F;
}

This isn't an argument for just opening the door to allowing all kinds of post monomorphization errors. Instead I want us to think about this topic to find out a comprehensible rule for where we do allow them and where not.

One thing that I also find problematic is that we have split the polymorphic error checking into two parts: const qualification and const propagation. These are actually duplicating each other's work to some degree.

This also overlaps somewhat with symbolic miri, because we could formulate all of these analyses as a single symbolic evaluation.

RalfJung commented 5 years ago

I think the approach outlined in https://github.com/rust-rfcs/const-eval/issues/17 would greatly help me understand const qualification: It is a static analysis approximating some dynamic property, and being more explicit about that property (specifying it precisely and maybe even implementing it) would help tremendously.

oli-obk commented 5 years ago

We can indeed rewrite everything as a dynamic check, but we'd lose all pre-monomorphization errors.

Const qualification is essentially a very limited form of symbolic execution (limited as in "doesn't know about control flow or loops") that solely operates on symbols for "drop", "constness" and "internal mutability" instead of operating on values.

RalfJung commented 5 years ago

We can indeed rewrite everything as a dynamic check, but we'd lose all pre-monomorphization errors.

Nono, that's not what I meant.

I am saying we should have both a dynamic check and static const qualification. The dynamic check should be sound, and it should be documented at https://github.com/rust-rfcs/const-eval/ which checks are performed and why. The static check should then make sure that the dyanmic check never triggers.

Basically, we already have in your (and @eddyb's) head a dynamic version of this check. It's the dynamic property that the static checks you are writing test for (as an approximation of course because halting problem). I am saying let's get that check out of your heads and into the miri engine.

ecstatic-morse commented 5 years ago

I've been having some zulip conversations with @eddyb regarding dataflow-based const qualification.

A problem quickly arises when trying to fit const qualification into the existing dataflow framework: namely, how to handle a definition of one local that has a data dependency on another (e.g. _1 = _2). We'd like to simply copy the qualifications for _2 into _1, but this is not expressible as a "gen/kill" operation. Trying to extend the existing framework with an assignment operator also presents a few problems, like how to handle aggregates (e.g. _1 = [_2, _3]) and ordered data dependencies (e.g. _1 = _2; _2 = _3; _3 = _1) while still guaranteeing that dataflow analysis converges. I believe that the |= operator required to handle these cases is mutually exclusive with the ability to coalesce transfer functions for all statements in a basic block. This is one of the properties that makes the "gen/kill" framework efficient, so implementing a separate dataflow engine would be necessary. #35608 contains a suitably generic one.

However, I believe that const qualification can be done in the existing framework using a reaching definitions analysis (@eddyb is skeptical of this; I'm hoping they can expound a bit in this thread). Furthermore, I believe that this formulation is equivalent to the one that requires |= above. Basically, we change the universe of the dataflow analysis from the set of locals (and their qualifications) to the set of definitions of locals. This information can be processed into a use-def chain (I've implemented this in #62547).

To explain how qualification actually works with a use-def chain, I'll use the HasInteriorMut qualifier as an example. Every definition of a local can be classified as unconditionally qualified (e.g. _1 = Some(Cell::new(5))), unconditionally not qualified (e.g. _1 = None), or conditionally qualified dependent on some set of definitions (e.g. _1 = _2 is dependent on all definitions of _2 that reach that point in the program). Qualification can then be expressed as a DFS over this use-def chain that iterates over dependent definitions and terminates when in finds an unconditionally qualified one.

A cyclic data dependency (e.g. loop { x = x + 1}), will present as a dependency on a definition that already exists on the DFS stack. We can safely ignore such dependencies because they are tautological; they express that x = x + 1 is qualified implies x = x + 1 is qualified. This approach extends to cycles of arbitrary length.

I believe that the approach described at the start of the post (I'll call it "copy-qualifs") is semantically equivalent to the one based on the use-def chain. Whereas the use-def chain explicitly stores the set of all definitions that could reach each use of a local, "copy-qualifs" stores only the union of the qualification bits of all those reaching definitions. Basically, the use-def chain approach is two separate steps: compute the data dependencies for each definition of a local then propagate qualifications between them, while the "copy-qualifs" approach coalesces this into a single operation, avoiding the need to actually store the data dependencies. The downside of not saving the intermediate result is that "copy-qualifs" must run a separate dataflow for each qualification type, while the use-def chain only needs to run one.

I want to keep driving this forward, but I would like some more input from wg-const-eval people first. @oli-obk @RalfJung @eddyb

eddyb commented 5 years ago

A cyclic data dependency (e.g. loop { x = x + 1}), will present as a dependency on a definition that already exists on the DFS stack. We can safely ignore such dependencies because they are tautological; they express that x = x + 1 is qualified implies x = x + 1 is qualified. This approach extends to cycles of arbitrary length.

I would prefer a fixpoint/saturation mechanism, that dyanmically checks the monotonicity here, instead of assuming it and risking some kind of subtle bug.

I'm working on a little library for doing memoized DFS with cycle-handling, coincidentally, maybe we should use that instead of implicit assumptions?

oli-obk commented 3 years ago

We (well, @eddby, but I implemented it) found a way that doesn't require a dataflow analysis: If a local variable in a const/static item's initializer has no StorageDead, then that means that it is a temporary in the final expression of the initializer and thus follows the "escaping scope" rule and gets interned. Thus we are allowed to reference it (otherwise borrowck would fail anyway). This way, even if the value has interior mutability, as long as the local that gets borrowed has a StorageDead, we know it won't end up in the final value of the constant and thus borrowing that interior mutable local is totally fine.

For the implementation see https://github.com/rust-lang/rust/pull/80418

RalfJung commented 3 years ago

If a local variable in a const/static item's initializer has no StorageDead, then that means that it is a temporary in the final expression of the initializer and thus follows the "escaping scope" rule and gets interned.

So far, the argument as I understood it was the other way around: if a local variable has a StorageDead, it is certainly not part of the final value since it will be deallocated first. I find that argument much easieser to believe than the one you phrased.