rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.88k stars 12.67k forks source link

Stabilize `shorter_tail_lifetime` #131445

Open dingxiangfei2009 opened 1 week ago

dingxiangfei2009 commented 1 week ago

cc @traviscross

Summary

shorter_tail_lifetimes is an implementation of rust-lang/rfcs#3606. The RFC calls for shortening of lifetime of temporaries generated from the tail expression of a block so that they are dropped first, especially before dropping local variables as opposed to after as how up until Edition 2021 the temporary lifetime rules have implemented.

As an example, up until Edition 2021, the following code sees the temporary value LoudDropper is dropped after the local variable x.

//@ edition: 2021
fn edition_2021() -> u32 {
    let x = LoudDropper;
    LoudDropper.get()
} // <-- `LoudDropper` gets dropped after dropping `x`

fn edition_2021_block() -> u32 {
    let y = {
        let x = LoudDropper;
        LoudDropper.get()
    }; // <-- `LoudDropper` gets dropped after dropping `x`
    y
}
fn edition_2021_with_return() -> u32 {
    let x = LoudDropper;
    return LoudDropper.get();
    // However in this case, `LoudDropper` gets dropped first, before `x` does
}

Instead, by stabilizing shorter_tail_lifetimes, when Edition 2024 applies to the span of a block or a function body, the same code observes change in the drop order. Retargeting Edition 2024, the same code above will see that dropping LoudDropper happens before dropping x.

//@ edition: 2024
fn edition_2024() -> u32 {
    let x = LoudDropper;
    LoudDropper.get()
} // <-- `LoudDropper` gets dropped first, then followed by `x`

fn edition_2024_block() -> u32 {
    let y = {
        let x = LoudDropper;
        LoudDropper.get()
    }; // <-- `LoudDropper` gets dropped first, then followed by `x`
    y
}

What is being proposed for stabilization

Previously on Edition 2021 and prior, the lifetime of temporary objects generate from evaluating the tail expression is unexpectedly longer than the lifetime of the local variable declaration of the block or function body. This has historically led to surprising rejections from the borrow checker, which is illustrated with the following example.

//@ edition: 2021
fn ref_cell() {
    let c = std::cell::RefCell::new(123);

    if let Ok(mut b) = c.try_borrow_mut() {
        *b = 321;
    }; // <-- Error if you remove the semicolon!
}

Without the semicolon,even if the if let visually resembles a statement, technically it is syntatically interpreted as a tail expression of the function body ref_cell. In this case, before returning the trivial () as the function's return value, a mutable borrow &mut RefCell<i32> on c is considered alive while c is dropped first before c.try_borrow_mut() is and, thus, the borrow checker rejects the code without the semicolon.

RFC 3606 proposes a change to the lifetime assignment to expression at the tail position, or the tail expression for short, of a function body or a block. In the current implementation, a terminating scope is inserted to surround the tail expression, forcing the destruction of temporary values generated from the evaluation, before the control flow proceeds to drop the other values scheduled for exiting the scope of a block and a function body. This is entirely predicated on whether the span targets an edition from 2024 and beyond.

In effect, the stabilization of the feature gate would allow the previous example to compile even without the artificial semicolon, plus the following example.

//@ edition: 2024
fn ref_cell() {
    let c = std::cell::RefCell::new(123);

    if let Ok(mut b) = c.try_borrow_mut() {
        *b = 321;
    } // <-- compiles in Rust 2024
}

fn f() -> usize {
    let c = RefCell::new("..");
    c.borrow().len() // compiles in Rust 2024
}

Breakage and the extend thereof

Indeed this change would reject the following code that used to compile in Edition 2021 and prior.

fn why_would_you_do_this() -> bool {
    let mut x = None;
    // Make a temporary `RefCell` and put a `Ref` that borrows it in `x`.
    x.replace(RefCell::new(123).borrow()).is_some()
}

This effectively assigns a lifetime 'a of x: Option<&'a i32> to an anonymous lifetime, which renders the x to be invalidated as soon as the control flow finishes evaluation of the tail expression. This, however, will not compile when shorter_tail_lifetimes is stabilized.

The state of Edition 2024 lint

There is possibility that for some reason there exists code that, at runtime, depends on this specific scheme of drop orders, in which temporary objects in a tail expression lives longer that one or more local variables. A possible case, though not endorsed by the language design, is the improper use of synchronisation primitives in likes of Mutex and even Atomic* as signals and semaphores, which protects resources and entries to critical sections that are external to those primitives. Here is a example for illustrative purposes.

For this reason, to maximally prevent silent breaking of the existing code in Rust ecosystem, a lint is developed to capture observable changes in drop orders triggered by Edition migration. The lint is based on the predicate of whether a temporary value of a type with significant drop implementation is generated.

Intuitively, the lint should warn users on the change in drop orders for the following code, targeting Edition 2021 and prior.

//@ edition: 2021
fn edition_2021() -> u32 {
    let x = LoudDropper;
//      ^
//  this local binding or value may observe changes in drop
//  order under Edition 2024
    LoudDropper.get()
//  ^~~~~~~~~~~
//  warn: this value has significant drop implementation
//  that will have a different drop order from that of Edition 2021
}

A previous version of the lint is implemented on master by #129864 but it is unfortunately developed based on HIR analysis, which has difficulty in leveraging the information on precise liveness information of the temporary values. The issue of the poor precision score of the lint is raised in #130836. In response, a few improvements to the existing lint are proposed.

Through local sampling tests, it has shown a better recall and precision and a crater experiment has been submitted to fully and quantitatively evaluate its performance.

One improvement to the new lint is also under development. It currently also reports the types that are considered as having significant destructors, but it has deficiency in reporting those that are nested in the type. Knowing this would help users to pinpoint the exact types for reviews, instead of tricking them into thinking that the lint suggestion as a false positive and disregarding it.

Future interaction

The stabilization of this feature will bring the very much needed parity and consistency of the language, in connection to the treatment of the temporary values for return values, with existing specification and future extension. Temporary values have been consistently throughout editions dropped first in the return statement, before the rest of the local variables. This change will also be consistent with the current become-statement implementation as per explicit tail call specification of rust-lang/rfcs#3407.

In general, this change should bring more consistency and unblocks future development that may involve lifetime assignment to temporary values.

dingxiangfei2009 commented 1 week ago

@rustbot labels +A-edition-2024

nikomatsakis commented 1 week ago

@rfcbot fcp merge

I am going to go ahead and kick off FCP. The full results of the crater run are not yet available, but regardless I am convinced that this is a good change for Rust.

rfcbot commented 1 week ago

Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns. See this document for info about what commands tagged team members can give me.

traviscross commented 1 week ago

@rfcbot reviewed

pnkfelix commented 1 week ago

@rfcbot reviewed

tmandry commented 1 week ago

@rfcbot reviewed

rfcbot commented 1 week ago

:bell: This is now entering its final comment period, as per the review above. :bell:

nikomatsakis commented 1 week ago

In the lang-team meeting, @pnkfelix asked to understand why the semantics were designed this way originally. The original reasoning had a design goal that the lifetime of temporaries created by an expression E should not change if E is enclosing in braces ({E}) -- see Appendix B of the original rvalue blog post. As a result, the rules were designed so that temporaries created by tail expressions "pop out" from the block scope and are dropped as part of the block's parent.

In retrospect, I now believe this was the wrong decision. In fact users expect {} to control the scope, and it's very surprising that the tail expression of a block cannot embed references to variables declared in the block into a temporary (as shown in the example). Moreover, whether or not something is a "tail expression" can at times be very non-obvious, given Rust's frequent punning between statements and expressions.

Therefore: reviewed and approved. ✅

rfcbot commented 2 days ago

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.