Closed kyouko-taiga closed 1 year ago
An investigation has revealed that the compiler is not correctly handling the control flow for functions where the return statements are inside both branches of a conditional statement.
Two potential solutions have been identified to address this issue:
emit(braceStmt:)
, lowerStatements()
and friendsAdjusting these functions to carefully analyze the if
and else
branches and ensure that all possible paths lead to a return statement.
Introduce a transformation at an earlier stage in the compiler (front-end?, middle-end?) that transform the AST of code like:
if true { return 0 } else { return 1 }
into
return if true { 0 } else { 1 }
Option 2 seems like a less complex and error-prone solution, as long as the transformation is valid in all situations. However, if preserving the original code structure or other concerns about the transformation's validity arise, Option 1 might be the preferable approach.
What do you think?
There's a third option: analyzing exit paths in the IR. The advantage is that all control-flow subtleties will have been lowered into simple jumps.
It used to work like that. See here. I removed the pass because the logic for distinguishing missing returns from implicitly inserted ones was bogus and difficult to debug. Maybe I should have persevered ...
If we can lay out the assumptions IR analysis can make about what return statements may have been omitted, then I guess we'd have our solution. I'll think about it and report my conclusions.
There's a third option: analyzing exit paths in the IR. The advantage is that all control-flow subtleties will have been lowered into simple jumps.
It used to work like that. See here. I removed the pass because the logic for distinguishing missing returns from implicitly inserted ones was bogus and difficult to debug. Maybe I should have persevered ...
If we can lay out the assumptions IR analysis can make about what return statements may have been omitted, then I guess we'd have our solution. I'll think about it and report my conclusions.
Any updates on this?
I think the third option is the way to go but I didn't have time to get to it yet. You can give it a try if you want, otherwise it will eventually be on the top of my stack.
I've given the third option a bash, but just to make sure I understood correctly, here's my approach:
Reason I'm investigating only "reachable" blocks is because if you apply this patch:
--- a/Sources/IR/Emitter.swift
+++ b/Sources/IR/Emitter.swift
@@ -233,9 +233,7 @@ struct Emitter {
) -> SourceRange {
switch emit(braceStmt: b) {
case .next:
- if !canonical(returnType).isVoidOrNever {
- report(.error(missingReturn: returnType, at: .empty(atEndOf: ast[b].site)))
- } else {
+ if canonical(returnType).isVoidOrNever {
emitStore(value: .void, to: returnValue!, at: .empty(atEndOf: ast[b].site))
}
return ast[b].site
You get the following IR from the code:
// lib.f
// lib.hylo:4.1-6:2
fun FunctionDecl(2139)() -> Int {
b0(%b0#0 : &Int):
%i0.0: &Bool = alloc_stack Bool
%i0.1: &i1 = subfield_view %i0.0, 0
%i0.2: &i1 = access [set] %i0.1
store i1(0x1), %i0.2
%i0.4: &i1 = subfield_view %i0.0, 0
%i0.5: &i1 = access [sink] %i0.4
%i0.6: i1 = load %i0.5
end_access %i0.5
dealloc_stack %i0.0
cond_branch %i0.6, b2, b1
b1():
%i1.0: &word = subfield_view %b0#0, 0
%i1.1: &word = access [set] %i1.0
store i64(0x1), %i1.1
return
b2():
%i2.0: &word = subfield_view %b0#0, 0
%i2.1: &word = access [set] %i2.0
store i64(0x0), %i2.1
return
b3():
%i3.0: &Int = access [set] %b0#0
store void, %i3.0
end_access %i3.0
return
}
Notice that b3
will never be reached.
The sketchy part is trying to infer the type of the return
from the IR block. You can look for the store
operations but in some cases it's the 2nd last parameter and in others it seems to be the 3rd last parameter. And sometimes the void
return type seems to be explicitly stored, and in other cases not.
I have something that passes the if-else
test above and doesn't break any other tests but it feels very sketchy.
The main issue is that we're still generating the b3
branch which is dead code. I think @esleche's option1 may be more elegant.
@kyouko-taiga please let me know if I went way off track. You can see my WIP here. By no means good code.
Other thought was that we may be able to side-step inferring the return type based only on IR blocks by just storing the return type along with the Return
struct when we insert it. Had a quick look at it and saw some text related to type-erasure and thought this may be leaking some abstraction. Thought I'd stop and ask first.
Very nice work. Thanks for investigating this issue!
The main issue is that we're still generating the b3 branch which is dead code.
As a general rule, we can't expect a function to only have reachable blocks until we reached refined IR (i.e. after mandatory IR passes). So we have to handle unreachable blocks during most analysis phases. We can do that very easily by building a CFG and only visit blocks reachable from the root.
The sketchy part is trying to infer the type of the return from the IR block.
Why do you need that?
The function knows the type of its return value. I think you figured as much since you used self[b.function].output
. Type checking already applied so we don't need to check whether the type of a return is "correct". It must be or we wouldn't be looking at IR.
Note that the return
instruction never returns any value in our IR. It only transfers control flow back to the caller. The DI pass (aka NormalizeObjectStates) guarantees that the return value is set when we hit a return statement. So you shouldn't have to care about that in your pass. You only need to guarantee that all reachable execution paths are post-dominated by a return
.
For example, if you remove the diagnostic in Emitter (as showed in your post) and try to compile this function:
fun foo() -> Int {
if true { return 0 }
}
DI will diagnose "set parameter not initialized before function returns".
I think we can get away by:
return
or unreachable
terminator (it should probably even be a well-formedness rule of the IR)Note that we also have to deal with calls to a Never
-returning function. In those situations, the return value storage won't be initialized when we hit return
but we shouldn't emit a missing return diagnostic. We could do that by having the emitter produce unreachable
instead of return
after a call to a Never
-returning function.
For example:
fun foo() -> String {
fatal_error();
}
IR for this function should probably be:
fun foo() -> String {
b0(%b0#0 : &String):
%i0.0: &Never = alloc_stack Never
%i0.1: &Never = access [set] %i0.0
call @fatal_error() to %i0.1
end_access %i0.1
unreachable
}
So only applying this patch:
--- a/Sources/IR/Emitter.swift
+++ b/Sources/IR/Emitter.swift
@@ -233,9 +233,7 @@ struct Emitter {
) -> SourceRange {
switch emit(braceStmt: b) {
case .next:
- if !canonical(returnType).isVoidOrNever {
- report(.error(missingReturn: returnType, at: .empty(atEndOf: ast[b].site)))
- } else {
+ if canonical(returnType).isVoidOrNever {
emitStore(value: .void, to: returnValue!, at: .empty(atEndOf: ast[b].site))
}
return ast[b].site
results in:
fun f2() -> Int {}
being converted to the following IR:
fun FunctionDecl(2305)() -> Int {
b0(%b0#0 : &Int):
%i0.0: &Int = access [set] %b0#0
store void, %i0.0
end_access %i0.0
return
}
So the fun f2() -> Int {}
passes the type checker even though the return type in the function signature doesn't match the type being returned (no type/implicit-void). Also currently it seems like the emitter is producing implicit returns.
My observation is that your suggestion that "You only need to guarantee that all reachable execution paths are post-dominated by a return." wouldn't catch this case of an empty function block with a non-void return type. The type checker misses it and the emitter covers it up by inserting an implicit return.
I believe the logic in the Emitter
that produces the implicit return types when it shouldn't may need fixing. Or perhaps the type checker needs to catch the case of an empty body? Let me know if I'm on the wrong course again please :)
I'm surprised to see a store being emitted. Something's wrong; the generated IR is definitely ill-typed.
Type checking doesn't check that the function has a return value, so I'm not surprised to see it pass. A function body that's an empty pair of braces is parsed as an empty list of statements.
I believe the logic in the Emitter that produces the implicit return types when it shouldn't may need fixing.
Yes. We should track where/why the store void
was emitted and fix that. Ideally we should have an assertion that makes sure we don't write an instance of the wrong type to %b0#0
, but that's another issue. Then I think we should be able to apply the strategy I suggested.
Or perhaps the type checker needs to catch the case of an empty body? Let me know if I'm on the wrong course again please :)
Yeah but that's only one case and it should be handled by the more general strategy. So I don't think it is worth changing anything to type checking for the moment.
I'm surprised to see a store being emitted. Something's wrong; the generated IR is definitely ill-typed.
I think I may have mixed up the patch I was talking about. I had a couple floating around. Re-testing this it all seems fine. Sorry about the confusion.
Then I think we should be able to apply the strategy I suggested.
Just published a PR with the strategy you mentioned. I've omitted the step to ensure all reachable blocks are terminated with a return
or unreachable
instruction because in the current tests it's never being violated. I'd be happy to add it as a separate PR if you still feel it needs to be enforced, ensuring no regression occurs. I don't think it belongs in this patch set though.
Absolutely fantastic! Thanks a lot.
I don't think it belongs in this patch set though.
Agreed. I should spend some time on documenting well-formedness properly before that.
Compile this:
The compiler complains that
f
doesn't return value on all paths.