rust-lang / unsafe-code-guidelines

Forum for discussion about what unsafe code can and can't do
https://rust-lang.github.io/unsafe-code-guidelines
Apache License 2.0
658 stars 57 forks source link

Temporary Scopes/Arenas to formalize lifetimes of local variables #308

Open arielb1 opened 8 years ago

arielb1 commented 8 years ago

Not sure this is the right place to put this, but @ubsan asked for it:

Semantic divergences from current rustc are in bold. Unresolved questions are (XXX: question).

Arenas

In the code examples here, assume this struct has been declared:

struct NoisyDrop(&'static str);
impl Drop for NoisyDrop {
    fn drop(&mut self) {
        println!("dropping {}", self.0);
    }
}

Arenas are used to ensure orderly and deterministic destruction of local variables and temporaries (XXX: buffer temporaries created for order-of-evaluation).

During its execution, a Rust function manages a stack of arenas, pushing new arenas to it and popping them.

Newly-created locals and temporary locations are always allocated at the end of an arena, but the arena need not be the topmost arena on the stack, as in example (1).

When an arena is popped, the locations within the arena are destroyed in the reverse order of their allocation. However, parts of a location that are in a deinitialized state are not destroyed, as in example (2).

Lifetime and region checking treats the destruction of an arena as through each location was destroyed separately, in order, but subject to implementation limitations.

The location allocated for a local/temporary (the "alloca") is valid from the moment it is allocated until just after the value inside it is destroyed when it is popped, as through it was an &move reference.

To simplify implementation, each arena in Rust can contain only a sequence of locations whose type and size are known at compile-time. Of course, this does not imply that an arena is stored in that order in memory.

NOTE: arenas are sometimes called "destruction scopes", but in rustc destruction scopes do not include binding arenas, so that term would be confusing. (XXX: Bikeshed!)

The arena tree

The arenas in a function are structured as a tree defined by the program's structure:

Remember that if let, while let and for are defined by their desugaring in terms of loop and match.

Observe that the tail expression of a block is under the block's binding arena but none of the other arenas

Before a parent arena is popped, all of its children are popped in LIFO order.

Local variables

Local variables, or bindings (XXX: AFAICT the terms are used interchangeably in rustc - do we want to change this?) created by a let-statement are allocated into the statement's containing block's binding arena. The local variables are allocated before the statement's initializer executes.

Local variables created by argument buffers (see example (3)) and argument bindings are allocated in the function's root arena. The order in which they are allocated is that each argument's patterns are allocated followed by the argument's buffer, from left to right.

Local variables created by a match arm are allocated twice - once when the guard is executed, within the guard's binding arena (which will not call a destructor because bindings in guards must either be Copy or ref mut t, but will release the binding's storage) (XXX: pending final status of borrowck in guards) and once within the arm's binding arena (XXX: this differs a bit from the current behavior, but the current behavior is unsound and not something I would like to keep - cc @eddyb @nikomatsakis).

Temporaries

Temporaries that are created by a temporary lexpr borrow-extended by a let-statement are allocated within that let-statement's containing block's binding arena. Other temporaries are allocated into the topmost non-binding arena on the stack when they are created (see Example (1)).

If the pattern of a let-statement contains a by-ref binding, the root lexpr of the let-statement's expression is borrow-extended (see Example (4)).

If an address-of expression is distinguished subexpression of a let-statement, the root lexpr of the address-of expression's subexpression is borrow-extended (see Example (5)).

The following expressions are distinguished subexpression of a let-statement:

In some other cases that have yet to be documented (XXX: document them).

No other temporaries are borrow-extended (e.g. type annotations do not matter - see rust-lang/rust#36082).

Example (1) - basic arenas

fn basic() {
    let x = NoisyDrop("local `x`");
    (NoisyDrop("temporary"), ()).1
}

Here there are 4 arenas:

root arena
    - function block arena
        - function block binding arena
            - `let` expression arena 

Here, the local x is allocated in the binding arena, but the (NoisyDrop("temporary"), ()) temporary skips the binding arena and is instead allocated into the function's block arena, and is destroyed after the NoisyDrop("localx").

Example (2) - conditional drop

fn condi_drop(init_flag: bool, fini_flag: bool) {
    let (init, fini);
    fini = NoisyDrop("fini");
    if init_flag {
        init = NoisyDrop("init");
    }
    if !fini_flag {
        std::mem::forget(fini);
    }
}

Here, the locations of the locals init and fini are allocated from the same arena in that order. Therefore, when the function is exited, fini is destroyed followed by init.

At least, that is the case if both are initialized - when both flags are true. If init_flag is false, then init is never initialized and therefore not destroyed, and if fini_flag is false, then fini is deinitialized without its destructor being run by mem::forget, and therefore it is not destroyed.

Example (3) - argument buffers

fn argument_buf((x1,_,z1): (NoisyDrop, NoisyDrop, NoisyDrop),
                (x2,_,z2): (NoisyDrop, NoisyDrop, NoisyDrop)) {
}

fn main() {
    argument_buf((NoisyDrop("x1"), NoisyDrop("y1"), NoisyDrop("z1")),
                 (NoisyDrop("x2"), NoisyDrop("y2"), NoisyDrop("z2")),);
}

Here, the first argument's bindings x1 and z1 are allocated first, followed by the first argument's buffer (which contains the triple (NoisyDrop("x1"), NoisyDrop("y1"), NoisyDrop("z1"))). Then, the same happens for the second argument.

Afterwards, all but the middle 2 fields of the buffers are moved to the argument bindings, so only the middle NoisyDrop's destructor is called when the buffer is destroyed.

Everything is allocated onto the function's root arena and dropped in reverse order, leading to

dropping y2
dropping z2
dropping x2
dropping y1
dropping z1
dropping x1

being printed

Example (4) - borrow extension by ref pattern

fn main() {
    let ref foo = (NoisyDrop("extended 1"), NoisyDrop("extended 2")).0;
    let _ = NoisyDrop("external");
}

The root lexpr of (NoisyDrop("extended 1"), NoisyDrop("extended 2")).0 is the lexpr (NoisyDrop("extended 1"), NoisyDrop("extended 2")), which is extended to the binding arena and therefore dropped after the unextended let.

This prints

dropping external
dropping extended 1
dropping extended 2

Example (5) - borrow extension by designated expression

fn main() {
    let _x : (&_, NoisyDrop) = (
        {&[&[NoisyDrop("extended 1")]]},
        NoisyDrop((&NoisyDrop("unextended"), "extended 2").1)
    );
    drop(NoisyDrop("external"));
}

This is a rather complicated case. The arena tree is

root arena
    - function block arena
        - function block binding arena
            - let expression arena
            - second statement expression arena

First, storage for _x is allocated in the block binding arena. Next, the first 2 address-of expressions are distinguished subexpressions of the let-statement, so the 2 arrays are allocated in that binding arena too. The third address-of expression is inside a function call, so it is not a distinguished subexpression, and is allocated within the let expression arena.

After the let-statement completes, the let expression arena is dropped, printing "dropping unextended". Then, the second statement prints "dropping external".

Afterwards, the binding arena is dropped in reverse order of pushing - first the temporary is dropped, printing "dropping extended 1", and the _x is dropped, printing "dropping extended 2".

The overall output is

dropping unextended
dropping external
dropping extended 1
dropping extended 2
arielb1 commented 8 years ago

cc @nikomatsakis @pnkfelix

nikomatsakis commented 8 years ago

@arielb1 when you asked me "what should we do about arenas", were you referring to this issue? and if so, I guess you were referring to the various XXX points?

I find this to be a reasonably nice way to formalize temporary scopes. I am not sure where we have latitude to change the current behavior at this point; the behavior of tail expressions can certainly be confusing though. I also don't have a better alternative just now. =)

arielb1 commented 8 years ago

Sure.

This is compatible with today's situation except for https://github.com/rust-lang/rust/issues/36082, which I think we can call "just a bug".