immunant / c2rust

Migrate C code to Rust
https://c2rust.com/
Other
3.95k stars 234 forks source link

provide statics with hypothetical origins #975

Closed aneksteind closed 1 year ago

aneksteind commented 1 year ago

Fixes #943 and fixes #776. Assigns origins to static variables, which provides parity in the number of OriginParams when comparing equality of LHS and RHS of assignments to local variables (which already had the correct number of OriginParams).

Questions:

spernsteiner commented 1 year ago

polonius does not appear to issue loans for statics, but it accounts for regions associated with them

Sounds right. Loans are created on &/&mut operations; for statics, those operations essentially happen before the start of main, and aren't part of any function (so we don't analyze them).

Does the assign_origins routine for statics adequately cover this?

Have you checked what Polonius normally does for statics that contain 'static lifetimes? I could imagine, for example, that there might be only a single origin corresponding to the concrete lifetime 'static, or that it's otherwise treated in some special way.

in a statement such as _2 = const {alloc1: *mut fdnode} the RHS is not issued a loan; is that correct? Polonius appears not to issue any loans here

I'm not sure. What does Polonius do for let x = &MY_STATIC? Does this issue a loan, or is it treated as a sort of no-op projection, like &*some_ref? It might be better to treat the const { alloc: *mut T } as a similar sort of loan or projection (matching whatever Polonius does for &MY_STATIC), since unlike rustc we treat *mut as if it were a ref and give it an origin.

Note this is separate from the previous point, which is about statics containing refs, like BAR here:

static FOO: i32 = 1;
static BAR: &'static i32 = &FOO;
aneksteind commented 1 year ago

Have you checked what Polonius normally does for statics that contain 'static lifetimes? I could imagine, for example, that there might be only a single origin corresponding to the concrete lifetime 'static, or that it's otherwise treated in some special way.

I'm not sure. What does Polonius do for let x = &MY_STATIC? Does this issue a loan, or is it treated as a sort of no-op projection, like &*some_ref? It might be better to treat the const { alloc: *mut T } as a similar sort of loan or projection (matching whatever Polonius does for &MY_STATIC), since unlike rustc we treat *mut as if it were a ref and give it an origin.

@spernsteiner I've run polonius on the following example:

static mut oneshot_fdn: fdnode = fdnode { ctx: &0 };

pub struct fdnode {
    pub ctx: &'static u8,
}

unsafe extern "C" fn server_free() {
    let x = &oneshot_fdn;
}

fn main() {
    unsafe {
        server_free();
    }
}

and it appears:

aneksteind commented 1 year ago

I'm not sure. What does Polonius do for let x = &MY_STATIC? Does this issue a loan, or is it treated as a sort of no-op projection, like &*some_ref?

It looks like the MIR for this treats as something like that:

        _2 = const {alloc1: *mut fdnode}; // scope 0 at test_static.rs:8:14: 8:25
                                         // mir::Constant
                                         // + span: test_static.rs:8:14: 8:25
                                         // + literal: Const { ty: *mut fdnode, val: Value(Scalar(alloc1)) }
        _1 = &(*_2);                     // scope 0 at test_static.rs:8:13: 8:25

and use_of_var_derefs_origin.facts notes:

"_1"    "'_#4r"
spernsteiner commented 1 year ago

It looks like rustc -Z nll-facts treats 'static as a universal/placeholder region, as if it was a parameter with the constraint that it outlives all other parameters. E.g. fn f<'a>(...) is treated like fn f<'static: 'a, 'a>(...). So there isn't much special handling for 'static in Polonius itself, aside from generating those extra constraints. As an implementation detail, the first placeholder region '_#0r is always 'static, then each lifetime parameter of the function gets a placeholder region, and finally there's an extra placeholder region at the end (maybe 'empty, or maybe a lifetime representing the duration of the function, or maybe something else). The first region outlives the lifetime parameters, and the parameters outlive the last region.

Accesses of statics (e.g. let x = FOO; where static FOO: i32) produce no loans and have no constraints relating them to 'static/'_#0r, as you noted. For non-mut statics, which produce MIR like _1 = const {alloc2: &'_#3r i32};, rustc seems to create a fresh region to the rvalue &i32, but leaves the region unconstrained, so it can be equal to 'static or any shorter region as needed. (The assignment then sets a constrain between the region on the LHS and the region '_#3r on the RHS.) I think it would also be legal to constrain the region to be exactly 'static (via subset_base('_#3r, '_#0r); subset_base('_#0r, '_#3r)), or to constrain it to be "'static or shorter". For mut statics, the MIR looks like _4 = const {alloc5: *mut i32};, and subsequent "reborrow" operations like _6 = &(*_4); create a fresh region for the Rvalue::Ref but leave it unconstrained.

Since we generally assign regions to raw pointers as well as refs, I think we should treat both the mut and non-mut cases the same: assign a region to the RHS, and leave it unconstrained so it can become 'static if needed (or something shorter if not).

For mentions of 'static within statics (e.g. static FOO_REF: &'static i32), it looks like rustc generates a fresh region each time the static is mentioned, then constrains it to be equal to 'static (x <= '_#0r, '_#0r <= x). This seems like a reasonable approach to me (and in general it's good to stick to the way rustc does things when possible). Instead of assigning regions to the actual static declarations, we would generate new fresh regions and constraints at each of their uses (MIR const { allocN: *mut T } etc.). Real and hypothetical regions can be treated the same in this context.

I think this also simplifies the Operand::Constant case, since we no longer need to look up the actual static DefId - we simply generate a fresh region for each position where one can appear in the type, and (since global allocs can only refer to 'static data) constrain each fresh region to be 'static.

aneksteind commented 1 year ago

@spernsteiner I haven't addressed the 'static lifetime piece yet, but I've addressed generating origins for statics in https://github.com/immunant/c2rust/pull/975/commits/99adfbbc57a38d02a7714804825563db3d9a5077. Does that seem aligned with what we talked about?

aneksteind commented 1 year ago

@spernsteiner would you mind taking another look?