rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
99.03k stars 12.79k forks source link

Implied bounds / well-formedness of references treats contravariant lifetimes the same as covariant #106431

Open Manishearth opened 1 year ago

Manishearth commented 1 year ago

Potentially related to https://github.com/rust-lang/rust/issues/25860 . The issues seem related but the examples are pretty different so I figured I'd file this as separate.

I'm not yet sure if I can craft an actual self-contained soundness bug out of this, but it does cause a soundness bug in Yoke which relies on for<'a> working correctly: https://github.com/unicode-org/icu4x/issues/2926. I have multiple potential fixes for the yoke issue, but it's worth at least noting.

TLDR: Rust currently correctly implies a 'inner: 'outer bound when you have &'outer &'inner _. It is also incorrectly implying this for &'outer fn(&'inner _)

The reduced example is as follows:


// A function that forces the closure to work for ALL lifetimes
// ... or does it?  };-)
fn attach<C, F>(cart: Box<C>, transform: F) where F: for<'all> FnOnce(&'all C) -> &'all str {
    // could call f(&cart);, etc
}

type Contra<'a> = fn(&'a ());

fn contra_shorter<'contra, 's: 'contra>(param_contra: Contra<'contra>, param_s: &'s str) {
    let contravariant_cart = Box::new(param_contra);
    attach(contravariant_cart, |_| param_s);

}

(playpen)

This should not compile. F requires the returned string to be valid for ALL lifetimes 'all, whereas we're returning a string of lifetime 's

This code is behaving the same as this code:

fn co_shorter<'co, 's: 'co>(param_co: &'co u8, param_s: &'s str) {
    let covariant_cart = Box::new(param_co);
    attach(covariant_cart, |_| param_s);
}

where the concrete type of F is for<'all> FnOnce(&'all &'co u8) -> &'all str, and well-formedness implies that 'all must be shorter than 'co, so you end up getting a 'co: 'all bound, and 's: 'co ends up giving us 's: 'all, so even though there is a for<'all>, the 'all is actually artificially restricted to being "for all shorter than 'co" and returning data of lifetime 's is valid there.

(the 's: 'co is actually not necessary for co_shorter() to compile, but it's easier to explain that way)


Basically, for &'outer Foo<'inner>, the implied bound probably should be:

Basically the implied bound is safe when things are guaranteed to not be contravariant. It's actually fine to have the implied bound for e.g. the invariant case of &'outer Cell<'inner _>, because a cell is logically a tuple of a covariant getter and contravariant setter, and the setter implies no bounds but the getter can imply bounds.

This does give me some pause since typically we tend to believe that code that works on invariant types will also work on contravariant types, and here's a situation where it won't; so perhaps it's safer to also opt invariant lifetimes out of this. Idk.

Contravariant types could imply the bound 'outer: 'inner because function parameters can't live for less than the function does, but I'm not sure this implied bound is ever useful. Does lead to a nice symmetry though.

The Yoke bug (https://github.com/unicode-org/icu4x/issues/2926) only shows up for contravariance due to how Yoke is structured. It's not clear to me that there's a way to write an unsafe abstraction that relies on for<'a> being truly universal where an implied bound coming from an invariant type may cause unsoundness. But I could be wrong.


A further exploration can be found in this code (playpen):

Show code ```rust // A function that forces the closure to work for ALL lifetimes // ... or does it? };-) fn attach(cart: Box, transform: F) where F: for<'all> FnOnce(&'all C) -> &'all str { // could call f(&cart);, etc } type Contra<'a> = fn(&'a ()); fn main() { // create a Box<&'a _> with a longer lifetime than s ('a_longer; where `'a_longer: 's`) let local_longer = vec![1,2,3,4]; let covariant_cart_longer: Box<&'_ [u32]> = Box::new(&*local_longer); // let's say this has lifetime 's let s = String::from("hello"); // Create a Box>, contravariant over 'contra let contravariant_cart: Box> = Box::new((|_| ()) as Contra); // Create a Box<_> that is fully owned let owned = vec![1,2,3,4]; let owned_cart = Box::new(owned); // create a Box<&'a _> with a shorter lifetime than s ('a_shorter, where `'s: 'a_shorter`) let local_shorter = vec![1,2,3,4]; let covariant_cart_shorter: Box<&'_ [u32]> = Box::new(&*local_shorter); // does not compile, &s cannot produce &'all str for ALL 'all // // attach(owned_cart, |_| &s); // This does compile, because in the bounds for F, `&'all C` is `&'all &'a_longer [u32]`, which forces // 'all to be shorter than 'a_shorter due to well-formedness / implied bounds (`'a_shorter: 'all`), reducing the for<'all> in scope // which works just fine for &s since 's is longer than 'a_shorter (`'s: 'a_shorter`) implying that `'s: 'all` attach(covariant_cart_shorter, |_| &s); // Seems like it shouldn't work, because the reason from before doesn't seem apply: even if 'all is forced to be shorter than 'a_longer, // there is still a period of time where 'a_longer is valid but 's is not, so the closure shouldn't compile. // // However, rustc is smart! It knows that it can make it work by squeezing 'a_longer *until* it is shorter than 's (`'s: 'a_longer`), and then WF rules force // us back into 'all being short (`'a_longer: 'all`), which brings us back to the previous situation; `'s: 'all` attach(covariant_cart_longer, |_| &s); // works, but shouldn't. 'contra is being selected to be *something*, and that somehow impacts WFness enough to // restrict the range of the for<'all> // // Without knowing WF rules for functions for sure, the clear answer is that *if* there are WF rules for functions, they would apply outside-in, not inside-out, // i.e. &'all fn(&'contra) would restrict 'contra to live at least as long as 'all, not vice versa. // This is unhelpful for us since we still have compiling code where &'s str is somehow a valid `for<'all> &'all str`; we MUST restrict 'all somehow // // To poke at this further without the problem of rustc picking some lifetime for 'contra, let's give it a named lifetime in two methods below: attach(contravariant_cart, |_| &s); } // First let's just give it a named lifetime fn contra_longer<'a>(param_contra: Contra<'a>) { let contravariant_cart = Box::new(param_contra); let s = String::from("hello"); // doesn't work, &s cannot produce &'all str for ALL 'all // huzzah! // attach(contravariant_cart, |_| &s); } // Now let's give it *and* s named lifetimes, and say that 's lives at least as long as 'contra fn contra_shorter<'contra, 's: 'contra>(param_contra: Contra<'contra>, param_s: &'s str) { let contravariant_cart = Box::new(param_contra); // ... does work, but shouldn't // this is the same behavior as that for covariant lifetimes, clearly // the WF rules are assuming covariance somehow; // // the previous function fails to compile because 's lives // less than 'contra, so we have a case similar to that of `covariant_cart_shorter` but this time the // covariance cannot come to the rescue attach(contravariant_cart, |_| param_s); } ```

The line attach(contravariant_cart, |_| &s); should not compile, nor should the function contra_shorter(). contra_longer() happens to correctly fail to compile because the same variance trick that saves covariant_cart_shorter does not apply here.

cc @pnkfelix @lcnr

Thanks to @workingjubilee for rubberducking this with me.

Manishearth commented 1 year ago

From talking to @BoxyUwU; there's an alternate way to look at this:

This all makes sense and is consistent. The tricky thing is the first line: implied bounds come from well-formedness, and there's not an actual WF reason for contravariant lifetimes to behave that way: &'outer fn(&'inner _) does not need any particular bounds to be valid.

In fact, the syntax C: 'outer when lifetimes contained in C are contravariant is itself kinda tricky, because it both means "all lifetimes contained in C outlive 'outer" and it means "all borrows contained in C outlive 'outer", but these mean different things when you're talking about contravariant lifetimes. In general in Rust it means the former, but the ambiguity can be confusing.

I think the real issue here is:

We should probably better document implied bounds, and consider relaxing how they work around contravariant lifetimes.

(as far as the Yoke issue, I do have a fix I can apply; though I'd rather not have to, I know it's going to take a while for there to be motion on this in any direction)

Manishearth commented 1 year ago

FWIW I'm happy to sit down and document WF/implied bounds if someone can tell me where it should go and help me with specifics, like, which things are WFness and which bounds are implied.

aliemjay commented 1 year ago
  • There is obviously a WF reason for &'outer &'inner _ to imply 'inner: 'outer. There is not one for the equivalent contravariant type.

I don't think this is an issue with variance. The real issue here is that all lifetimes that appear on trait objects and fn pointers are treated as "outlives components" of the underlying type. This can occur with invariant and covariant lifetimes as well. Here is an example with invariant lifetime 'a:

trait Trait<'a> {}
fn test<'a>(_: &'static dyn Trait<'a>) { // implies 'a: 'static !!
    None::<&'static &'a ()>; // check 'a: 'static
}

This might not be desired and can lead to surprising behavior, like dyn Fn(&'a str) -> &'a str + 'static is not really static unless 'a: 'static, but it's not really a soundness bug, I believe.

  • Implied bounds do affect universal quantification

  • Unsafe code may rely on universal quantification in ways that implied bounds will mess with it

Universal quantifiers for trait bounds is bounded by the condition that trait input types must be well-formed. For example, the trait bound for<'a> Fn(&'a T) actually means for any lifetime 'a such that &'a T is well formed, and this may not include 'static.

workingjubilee commented 1 year ago

I don't think this is an issue with variance. The real issue here is that all lifetimes that appear on trait objects and fn pointers are treated as "outlives components" of the underlying type.

That is in fact what is being referred to here with respect to "variance", as the actual way of expressing this conversation involves wheeling out predicates in a language that is not Rust (logic) because Rust only interprets lifetime bounds in contextual manners... i.e. variance... and thus having an overly consistent behavior in terms of creating outlives requirements here is inconsistent with how the lifetimes in question are read and written.

That is a problem whether or not it is a strict "soundness issue" in the compiler, because the underlying thesis of Rust is that the combination of unsafe code with lifetimes can be used to create programs that constrain behavior to what is sound. And that means it needs to, as much as possible, not fall down into a pile of miserable edge-cases when it comes to actually trying to do that.

workingjubilee commented 1 year ago

I have fixed the tags. The categories are made for humanity, humanity was not made for the categories. Let's not bog down this issue's comment thread with further haggling over whether it is "unsound" or merely "surprising", please.

steffahn commented 1 year ago

I don’t see good ways to change the current behavior here, but feel free to point out where I might not be seeing the way out.

The proposition of this issue – or perhaps feature request – is that for<'a> FnOnce(&'a Foo<'b>) should mean, in some sort of pseudo-syntax adding where clauses to the for:

If the invariant case where for<'a, where 'b: 'a> FnOnce(&'a Foo<'b>) as well, then changing Foo<'b> from being invariant to being contravariant in 'b would become a breaking change, because the body a closure implementing this trait bound could stop compiling if it relies on the 'b: 'a bound. Turning a type from invariant into contravariant is not supposed to be breaking AFAIK, so that would be problematic.

So I guess the invariant case would need to become for<'a /* unbounded, for all 'a */> FnOnce(&'a Foo<'b>), too?

But then implementing such a closure could become next to impossible in the invariant case: If the closure body did not get to infer 'b: 'a, then &'a Foo<'b> might not be a well-formed type at all; so how are we able to have it in the signature, or have variables of the same type, or of related types in the closure body?

As far as I can tell, the proposed argument here might be that we want something like fn(&'a ()): 'static to be universally true, which is something that I’ve seen in previous discussions, but as far as I remember it was more about a “fn-pointers are special” kind of idea, and less of a “contravariant lifetimes are special” idea.

Even if we made sure that every contravariant (in 'b) type Foo<'b> would not require 'b: 'a for &'a Foo<'b> to be well-formed, the same will be impossible for invariant types Foo<'b>, because those can actually really contain references of that specific lifetime.

For illustration: For example, if struct Struct<'b>(fn(&'b (), &'b i32) (which is invariant in 'b) did not require 'b: 'a for a &'a Struct<'b> reference type to be allowed, then either

apiraino commented 1 year ago

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium

Manishearth commented 1 year ago

I don’t see good ways to change the current behavior here, but feel free to point out where I might not be seeing the way out.

Yeah, I agree. I do think this is a niche behavior that needs to be documented somewhere. I might take a crack at it.

safinaskar commented 1 year ago

I spent some time trying to create full-blown unsoundness example based on this bug. I. e. to create program, which will print garbage or something similar. I was unable to do so. It seems that described behavior is not a bug. It is simply deliberate design.

Currently existence of type &'a fn(&'b _) implies 'b: 'a (proof: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=1e20272b307a10a4f4a3ae70c84e58dc ).

Moreover, currently fn(&'b _): 'a implies 'b: 'a (proof: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=ff7d32ad000c129d098b4b1411b94552 ).

So, T: 'a currently, in fact, does not mean "T outlives 'a". It means "all lifetime variables free in T outlive 'a". Personally I dislike this. I would not do this in my own hypothetical language. But it seems this decision already deeply baked into Rust. In particular, currently T: 'static means that all lifetime variables free in T outlive 'static, i. e. T does not have free lifetime variables. This means that if T: 'static, then T guaranteed not to have any embedded lifetimes, and thus it is guaranteed that type info for T can be encoded without need to store lifetime information. As well as I understand, Any relies on this.

So, this "bug" is not really bug, unless you want to redesign Any.

But this means that we should very clearly describe meaning of T: 'a in reference. So this is documentation bug

lcnr commented 1 year ago

I also believe that we should document this behavior somewhere and there isn't anything we can change behavior-wise.

I started to write some docs for the reference about implied bounds https://github.com/rust-lang/reference/pull/1261. That documentation is fairly vague at times because implied bounds are an inconsistent mess and unsound, so documenting the status quo there is somewhat :/

What I am unsure about is where an in-depth documentation would be appropriate, the reference, nomicon, or dev-guide?

safinaskar commented 1 year ago

@rustbot label +A-docs