jameysharp / corrode

C to Rust translator
GNU General Public License v2.0
2.16k stars 116 forks source link

Loop translator generates code with variables in the wrong scope #137

Open Marwes opened 7 years ago

Marwes commented 7 years ago

Running this (minimized) code through corrode gives an error when attempting to compile it with rustc.

int main()
{
    while (1)
    {
        int i = 0;

        if (1)
        {
            i++;
            break;
        }
    }
    return 0;
}
error[E0425]: cannot find value `i` in this scope
  --> test.rs:14:9
   |
14 |     i = i + 1;
   |         ^ not found in this scope

Generated code

fn main() {
    let ret = unsafe { _c_main() };
    ::std::process::exit(ret);
}

#[no_mangle]
pub unsafe extern fn _c_main() -> i32 {
    'loop0: loop {
        let mut i : i32 = 0i32;
        if true {
            break;
        }
    }
    i = i + 1;
    0i32
}
jameysharp commented 7 years ago

Absolutely true! As it happens, I've already been trying to work out how to fix this, though I got distracted trying to make the generated code more readable (see #6). There's some background about the scope of variable declarations in issue #30. For the record, here are my notes so far:

Variable declaration placement

  1. Save the type of each unique Ident variable declaration, but don't emit any declarations. Replace initialized local variable declarations with simple assignments.

  2. Compute alias analysis on the CFG. Initially we can probably just assume that any variable that has its address taken (or the address of any field or element inside it) may be aliased by every pointer.

  3. Compute live variables at entry to each basic block in the CFG. Any use of a pointer should be treated as a possible use of every variable the pointer might alias. Uses of a mutable pointer in assignments or function calls should also be treated as ambiguous definitions of every variable the pointer might alias.

  4. After structuring, if a variable is live on entry to any handler in a Multiple block or the body of a Loop block, then treat it as live at the beginning of that block as well.

  5. If a variable is live at the beginning of the function, then it is used possibly-uninitialized. Declare it at the beginning of the function and initialize it with std::mem::uninitialized().

  6. We need to find the latest/deepest place that we can legally place each variable declaration, subject to the constraint that if the variable is live on entry to a block, then its declaration must come before that block. Keep track of the variables we've already placed within the current scope. If a variable is either live in the block after this one or used within this block, but is not already placed, then place it in this block. (Note that if it were live on entry to this block, we'd have already placed it in an earlier block, so we don't need to check that.) If a variable of the same name is already placed, then rename this one to be unique.

  7. For each block, insert declarations for the variables that have been placed in that block. For Loop or Multiple blocks, prepend a new Simple block with an uninitialized declaration for each variable placed there. In a Simple block, for each placed variable, find the first statement mentioning it and insert the let-binding before it. Note that in Rust it is always permitted to declare a variable without initializing it, even if the variable is not mutable; the compiler just enforces that there is exactly one assignment on every path from the declaration to each use.

Future work

Marwes commented 7 years ago

Is it really corrode's job to calculate in which scope to put each variable? As long as it does a decent enough job (probably don't want all variables placed at the top of the function), it should be enough to just place the variables "where they are in the C source" (not saying that is trivial though). Feels like your notes focuses a lot on improving the generating code above that?

ssokolow commented 7 years ago

@Marwes

I think it makes sense.

Corrode is intended to produce code for humans to read and maintain (unlike emscripten) and computers are much better than humans at not making "my mind wandered" mistakes or overlooking things, so relying on the human to refactor the variable scopes isn't really the best idea.