mit-plv / bedrock2

A work-in-progress language and compiler for verified low-level programming
http://adam.chlipala.net/papers/LightbulbPLDI21/
MIT License
297 stars 45 forks source link

Support for let bindings in fiat2->bedrock2 expression compiler #359

Closed psvenk closed 9 months ago

psvenk commented 1 year ago

This still contains some admits relating to proving that names are not reused, which turned out to be harder than expected. I plan on resolving these by creating a new intermediate language that does not allow shadowing (we need to add an intermediate language for notating memory lifetimes anyway), so I figured that I would open a pull request for this separately from those more substantial changes.

psvenk commented 1 year ago

creating a new intermediate language that does not allow shadowing

I think the whole point of using a namemap and a fresh name generator is to be able to deal with shadowing right here in this phase. If you add a separate deshadowing phase anyways, you don't need the namemap and fresh name gen here and can just use the same names in the source and target language here.

Yes; what I meant is that the preprocessing step would take care of deshadowing (using a namemap and fresh name generator) and Compile.v would just use the same names in the source and target languages. I realized that this can be done in the existing elaborate phase (by making wf require deshadowing), so I'll take care of that in this pull request itself.

psvenk commented 1 year ago

Sorry for the delay in addressing these comments. I was trying to first complete the correctness proof as is, and it seems that it is not possible given the current architecture (with the namemap being used at the Elaborate stage). For example, if we have the pexpr (let "x" := 1 in "x") + (let "x" := 2 in "x"), then a deshadowing elaborator as currently implemented would not necessarily choose distinct variable names in the subexpressions. So, the wf statement is not strong enough to prove the compiler correctness statement.

In the context of the proof, specifically, this becomes an issue when the map.extends result is removed; because the commands associated with the subexpressions of an EBinop are run in sequence, we lose access to the inductive hypothesis when we don't know that the locals environment is preserved between the first command's execution and the second command's (and indeed, the variable x would be overwritten in this case despite the lack of "shadowing" in the surface language).

I see two paths forward here: either we could go back to performing deshadowing in the compiler (which would bring back the issues discussed earlier when it comes to handling memory), or we introduce an intermediate language at this stage to deal with let bindings (and soon after, extend that intermediate language to facilitate dealing with memory). There would be a command-to-command compiler from the expr language to the intermediate language and then a command-to-command compiler from the intermediate language to bedrock2, so that, e.g., let "y" := (let "x" := 1 in "x") + (let "x" := 2 in "x") in c, where c is some continuation command, would have an intermediate language representation that looks like

let x := 1;
let x0 := 2;
let y := x + x0;
unset x;
unset x0;
[[c]];
unset y;

y is notably allowed to outlive x and x0 here; of course, we would need to revisit lifetime semantics once in-place operations for lists and whatnot are in contention.

I'm leaning toward the second option because it should be more flexible, but I have two concerns before moving forward: (a) whether this is truly sufficient to prove the correctness statement, and (b) whether this would be in the scope of the current pull request.

For (a), I believe the answer is yes, but the more interesting question here is what the correctness statements (for both expr->intermediate and intermediate->bedrock2) will actually look like. Given the structure of the interpreter, I would be inclined to make a statement about the locals/memory after execution of the program. However, because of how scope will be handled, it wouldn't be possible to make any statement about the contents of locals after the entire program (including all unsets as everything goes out of scope) has been run. This could be prevented by having a manual override for scope, but I feel that a more elegant solution would be to add some sort of "return" statement and reason about the returned value (which can be put in a specific named variable in bedrock2) instead. Or, this could somehow take the form of continuation passing style instead.

As for (b), my inclination is toward no, in which case I could split off all namegen-related commits from this pull request, make some fixes regarding refactoring and whatnot as you have noted, and leave some admits as this is merged.

Apologies for the long and delayed comment; I'm interested in hearing your thoughts on the matter.

samuelgruetter commented 1 year ago

Let's make sure we distinguish between shadowing (eg let x = 1 in (let x = 2 in x)) and reusing (eg (let x = 1 in x) + (let x = 2 in x)). I think a source program with reusing should not be a problem. As a next step, I would recommend thinking more about the signature of compile_expr (see also this comment). Once you have a clear understanding how much you want or don't want to thread name maps and namgen states through compile_expr, the rest should become clearer too. And an additional idea for that function's signature would be to give it as an argument the name of the variable to which you want the result to be assigned. This could be important to get compilation of expressions to be modular enough: After the source statement (let x = a in b), x is not in scope any more, so expecting that it is in scope after the compiled statement corresponding to that source statement would seem to introduce weird constraints (not sure if you were actually trying this). Or said differently, consider this example: (let x = 1 in (let y = 2 in (let z = x + y in z))). Its result value is stored in z, but expecting that z is still available after that statment seems wrong, because z's scope ends after the first ).