Marwes commented 7 years ago

Running this (minimized) code through corrode gives an error when attempting to compile it with rustc.

int main()
{
    while (1)
    {
        int i = 0;

        if (1)
        {
            i++;
            break;
        }
    }
    return 0;
}

error[E0425]: cannot find value `i` in this scope
  --> test.rs:14:9
   |
14 |     i = i + 1;
   |         ^ not found in this scope

Generated code

fn main() {
    let ret = unsafe { _c_main() };
    ::std::process::exit(ret);
}

#[no_mangle]
pub unsafe extern fn _c_main() -> i32 {
    'loop0: loop {
        let mut i : i32 = 0i32;
        if true {
            break;
        }
    }
    i = i + 1;
    0i32
}

jameysharp commented 7 years ago

Absolutely true! As it happens, I've already been trying to work out how to fix this, though I got distracted trying to make the generated code more readable (see #6). There's some background about the scope of variable declarations in issue #30. For the record, here are my notes so far:

Variable declaration placement

Save the type of each unique Ident variable declaration, but don't emit any declarations. Replace initialized local variable declarations with simple assignments.
Compute alias analysis on the CFG. Initially we can probably just assume that any variable that has its address taken (or the address of any field or element inside it) may be aliased by every pointer.
Compute live variables at entry to each basic block in the CFG. Any use of a pointer should be treated as a possible use of every variable the pointer might alias. Uses of a mutable pointer in assignments or function calls should also be treated as ambiguous definitions of every variable the pointer might alias.
After structuring, if a variable is live on entry to any handler in a Multiple block or the body of a Loop block, then treat it as live at the beginning of that block as well.
If a variable is live at the beginning of the function, then it is used possibly-uninitialized. Declare it at the beginning of the function and initialize it with std::mem::uninitialized().
We need to find the latest/deepest place that we can legally place each variable declaration, subject to the constraint that if the variable is live on entry to a block, then its declaration must come before that block. Keep track of the variables we've already placed within the current scope. If a variable is either live in the block after this one or used within this block, but is not already placed, then place it in this block. (Note that if it were live on entry to this block, we'd have already placed it in an earlier block, so we don't need to check that.) If a variable of the same name is already placed, then rename this one to be unique.
For each block, insert declarations for the variables that have been placed in that block. For Loop or Multiple blocks, prepend a new Simple block with an uninitialized declaration for each variable placed there. In a Simple block, for each placed variable, find the first statement mentioning it and insert the let-binding before it. Note that in Rust it is always permitted to declare a variable without initializing it, even if the variable is not mutable; the compiler just enforces that there is exactly one assignment on every path from the declaration to each use.

Future work

Sometimes variable scope can be shortened by re-arranging the control flow structure. If a variable is live on entry to a Loop block's successor but has not been placed by the time we're considering the Loop block, we can check whether it's safe to move the successor blocks in which that variable is live to inside the loop.
How do the rules for goto into or out of a variable's scope affect this? (See issue #30 for background.) I think the answer might be that a variable should not be considered live across branches that either enter or leave the scope in which that variable is declared; but this interacts with the rule above for detecting a variable that may be used without being initialized. I suspect that not doing anything special here will lead to results that are legal but may generate code that is more complicated than necessary in corner cases.
When inserting a let-binding before a statement that is the first use of the corresponding variable, that statement will contain a definite assignment to the variable. (If it did not, then the variable would still be live before that point and we would not be inserting its let-binding there.) If the assignment is the only thing the statement does (that is, the assignment isn't buried inside a block expression), then the declaration and assignment can be merged into a let-binding with an initializer. (This simplification will be applicable more often if issue #6 is fixed.)
Imprecise alias analysis can lead to variables being declared in a broader scope than necessary, and maybe unnecessarily marked possibly-uninitialized. I think it's safe to only allow pointers to alias variables that were in scope when the pointer was used in the original C source. Eventually this might include type-based alias analysis controlled by -fstrict-aliasing. The Dragon book also points out that any pointer arithmetic implies that the pointers involved must refer to an array, so we can partition aliases that way (but does this require the strict aliasing assumption?).
If two distinct variables have the same name, but the lifetime of one lies entirely between references to the other, then sometimes we can avoid renaming either one, by wrapping the uses of the shorter-lifetime variable in a new Rust block. Within that block, one variable will shadow the other, but the old variable will become visible again on exit from the block. Open question: is it better to introduce a block or to rename variables?
We could identify a variable with the set of uses of its identifier, rather than its identifier alone. Split a variable into multiple declarations, each with smaller scope, when possible. Sometimes that would let us avoid renaming variables. It may also allow the Rust compiler to warn about more variables that do not need to be mutable. It might also just be annoying and clutter the output, and it's certainly something that could be done in a separate tool, so maybe it shouldn't be done in Corrode itself.
Delete unused initializers? But splitting a variable into its disjoint live ranges, as in the previous item, would create a let-binding that is set but not read in that case, and then we can just let the Rust compiler warn about that.

Marwes commented 7 years ago

Is it really corrode's job to calculate in which scope to put each variable? As long as it does a decent enough job (probably don't want all variables placed at the top of the function), it should be enough to just place the variables "where they are in the C source" (not saying that is trivial though). Feels like your notes focuses a lot on improving the generating code above that?

ssokolow commented 7 years ago

@Marwes

I think it makes sense.

Corrode is intended to produce code for humans to read and maintain (unlike emscripten) and computers are much better than humans at not making "my mind wandered" mistakes or overlooking things, so relying on the human to refactor the variable scopes isn't really the best idea.

jameysharp / corrode

Loop translator generates code with variables in the wrong scope #137

Variable declaration placement

Future work