immunant / c2rust

Migrate C code to Rust
https://c2rust.com/
Other
3.91k stars 229 forks source link

analyze: new implementation of MIR-to-HIR rewrite lifting #934

Closed spernsteiner closed 1 year ago

spernsteiner commented 1 year ago

This replaces rewrite::expr::hir_op with three new modules:

rewrite::expr::unlower: Builds the "unlowering map", which maps a MIR "precise location" (a Location plus a Vec<SubLoc> path) to information about the HIR that was lowered to produce that piece of the MIR. Specifically, it records the HirId of the HIR Expr that was lowered along with a description of how the MIR relates to the HIR. For example:

For example:

fn f(a: i32) -> i32 {
    a + 1
}

fn g(x: i32, y: i32) -> i32 {
    x + f(y)
}

For f, the unlowering map annotates the MIR as follows:

block bb0:
  bb0[0]: StorageLive(_2)
  bb0[1]: _2 = _1
    []: StoreIntoLocal, `a`
    [Rvalue]: Expr, `a`
  bb0[2]: _0 = Add(move _2, const 1_i32)
    []: StoreIntoLocal, `a + 1`
    [Rvalue]: Expr, `a + 1`
  bb0[3]: StorageDead(_2)
  bb0[4]: Terminator { source_info: ..., kind: return }

The statement _2 = _1 is associated with the expression a; the statement as a whole is storing the result of evaluating a into a MIR local, and the statement's rvalue _1 represents the expression a itself. Similarly, _0 = Add(move _2, const 1) stores the result of a + 1 into a local. If needed, we could extend the unlower pass to also record that move _2 (a.k.a. bb0[2] [Rvalue, RvalueOperand(0)]) is lowered from the Expr a.

On g, the unlowering map includes the following (among other entries):

bb0[5]: Terminator { source_info: ..., kind: _4 = f(move _5) -> [return: bb1, unwind: bb2] }
  []: StoreIntoLocal, `f(y)`
  [Rvalue]: Expr, `f(y)`
  [Rvalue, CallArg(0)]: Expr, `y`
bb1[1]: _0 = Add(move _3, move _4)
  []: StoreIntoLocal, `x + f(y)`
  [Rvalue]: Expr, `x + f(y)`

The call terminator _4 = f(move _5) computes f(y) and stores the result into a local; its rvalue is f(y) itself, and the first argument of the rvalue is y.

rewrite::expr::distribute: Given the MIR rewrites produced by mir_op and the unlowering map from unlower, this module distributes MIR rewrites to HIR nodes (identified by their HirIds). This step also checks for ambiguous case, such as multiple rewrites from different MIR locations being mapped to the same HIR node, where it's unclear in which order to apply the rewrites, and also detects MIR rewrites that couldn't be mapped to any HIR node.

Using the example from above:

bb0[5]: Terminator { source_info: ..., kind: _4 = f(move _5) -> [return: bb1, unwind: bb2] }
  []: StoreIntoLocal, `f(y)`
  [Rvalue]: Expr, `f(y)`
  [Rvalue, CallArg(0)]: Expr, `y`

A MIR rewrite on bb0[5] [Rvalue, CallArg(0)] would be attached to the MIR Expr y, and a rewrite on bb0[5] [Rvalue] would be attached to f(y). A MIR rewrite on bb0[5] [] (i.e. on the call terminator itself) would result in an error, since there is no good place in the HIR to attach such a rewrite.

rewrite::expr::convert: Converts the MirRewrites for each HIR Expr into Span-based rewrite::Rewrites. These are returned as the final result of expr::gen_expr_rewrites and are later applied to the input source code using rewrite::apply.


Overall, this branch simplifies the handling of SubLocs; distribute treats Vec<SubLoc> paths as mostly opaque (it expects each MIR rewrite's sub_loc field to exactly match an entry in the unlowering map), and convert doesn't deal with SubLocs at all. More precise SubLoc handling, which will be needed for some tricky pointer-to-pointer cases, goes in the unlower module, whose sole purpose is establishing the mapping between SubLocs and HIR. And unlower makes the mapping between MIR and HIR explicit (unlike hir_op, where it was computed implicitly while building rewrites), so it can be inspected for easier debugging.

All test cases produce the same rewrites as before this branch, except for one place in lighttpd-minimal, where we now place a Cell::from_mut cast in a more appropriate location.

spernsteiner commented 1 year ago

By the way, is there a reason the base branches for this and other PRs is different (in name) but equal (in commit) from the PR branch of the other PRs?

Keeps github from getting confused about what to show in the diff when I do more work (add commits and/or rebase) on the various branches or merge PRs out of order. I forget what exactly triggered this, but I definitely ran into some trouble along these lines with my previous batch of PRs.

aneksteind commented 1 year ago

I'm curious if you think the unlower module would be worth breaking out into its own crate and put on crates.io @spernsteiner, it seems useful in its own right

spernsteiner commented 1 year ago

Rebased and addressed all comments

spernsteiner commented 1 year ago

@kkysen https://github.com/immunant/c2rust/pull/839#issuecomment-1584110904

Actually, there seem to be issues with https://github.com/immunant/c2rust/pull/934 running on the string_casts.rs tests

Did you test string_casts.rs on #934 alone, or on a merge of #934 and #839? I've rebased this branch (#934) onto master (which now includes #839) and I haven't seen any test failures.

kkysen commented 1 year ago

Did you test string_casts.rs on #934 alone, or on a merge of #934 and #839? I've rebased this branch (#934) onto master (which now includes #839) and I haven't seen any test failures.

I merged #934 (this one) into #839 and tested it. If rebase #934 on master, which now includes #839, works, then that's great and there's no issue at all.