kyouko-taiga commented 1 year ago

Compile this:

public fun f() -> Int {
  if true { return 0 } else { return 1 }
}

The compiler complains that f doesn't return value on all paths.

esleche commented 1 year ago

An investigation has revealed that the compiler is not correctly handling the control flow for functions where the return statements are inside both branches of a conditional statement.

Proposed Solutions

Two potential solutions have been identified to address this issue:

Option 1: Modify `emit(braceStmt:)`, `lowerStatements()` and friends

Adjusting these functions to carefully analyze the if and else branches and ensure that all possible paths lead to a return statement.

Pros:

Preserves the original structure of the code.
More general solution that deals with the underlying issue.

Cons:

Potentially more complex to implement.
More prone to errors if not handled carefully, especially in more intricate cases.

Option 2: Early-Stage Transformation

Introduce a transformation at an earlier stage in the compiler (front-end?, middle-end?) that transform the AST of code like:

if true { return 0 } else { return 1 }

into

return if true { 0 } else { 1 }

Pros:

Likely simpler and more direct to implement.
Utilizes existing compiler logic that can already handle this case.

Cons:

Changes the original structure of the code.
Need to ensure that the transformation is semantically equivalent in all possible cases.

Conclusion

Option 2 seems like a less complex and error-prone solution, as long as the transformation is valid in all situations. However, if preserving the original code structure or other concerns about the transformation's validity arise, Option 1 might be the preferable approach.

What do you think?

kyouko-taiga commented 1 year ago

There's a third option: analyzing exit paths in the IR. The advantage is that all control-flow subtleties will have been lowered into simple jumps.

It used to work like that. See here. I removed the pass because the logic for distinguishing missing returns from implicitly inserted ones was bogus and difficult to debug. Maybe I should have persevered ...

If we can lay out the assumptions IR analysis can make about what return statements may have been omitted, then I guess we'd have our solution. I'll think about it and report my conclusions.

esleche commented 1 year ago

There's a third option: analyzing exit paths in the IR. The advantage is that all control-flow subtleties will have been lowered into simple jumps.

It used to work like that. See here. I removed the pass because the logic for distinguishing missing returns from implicitly inserted ones was bogus and difficult to debug. Maybe I should have persevered ...

If we can lay out the assumptions IR analysis can make about what return statements may have been omitted, then I guess we'd have our solution. I'll think about it and report my conclusions.

Any updates on this?

kyouko-taiga commented 1 year ago

I think the third option is the way to go but I didn't have time to get to it yet. You can give it a try if you want, otherwise it will eventually be on the top of my stack.

WalterSmuts commented 1 year ago

I've given the third option a bash, but just to make sure I understood correctly, here's my approach:

Remove the current logic that rejects the test case
Add an IR pass that investigates all the reachable blocks of code and...
Ensure all reachable blocks return the type you're expecting from the function

Reason I'm investigating only "reachable" blocks is because if you apply this patch:

--- a/Sources/IR/Emitter.swift
+++ b/Sources/IR/Emitter.swift
@@ -233,9 +233,7 @@ struct Emitter {
   ) -> SourceRange {
     switch emit(braceStmt: b) {
     case .next:
-      if !canonical(returnType).isVoidOrNever {
-        report(.error(missingReturn: returnType, at: .empty(atEndOf: ast[b].site)))
-      } else {
+      if canonical(returnType).isVoidOrNever {
         emitStore(value: .void, to: returnValue!, at: .empty(atEndOf: ast[b].site))
       }
       return ast[b].site

You get the following IR from the code:

// lib.f
// lib.hylo:4.1-6:2
fun FunctionDecl(2139)() -> Int {
b0(%b0#0 : &Int):
  %i0.0: &Bool = alloc_stack Bool
  %i0.1: &i1 = subfield_view %i0.0, 0
  %i0.2: &i1 = access [set] %i0.1
  store i1(0x1), %i0.2
  %i0.4: &i1 = subfield_view %i0.0, 0
  %i0.5: &i1 = access [sink] %i0.4
  %i0.6: i1 = load %i0.5
  end_access %i0.5
  dealloc_stack %i0.0
  cond_branch %i0.6, b2, b1
b1():
  %i1.0: &word = subfield_view %b0#0, 0
  %i1.1: &word = access [set] %i1.0
  store i64(0x1), %i1.1
  return
b2():
  %i2.0: &word = subfield_view %b0#0, 0
  %i2.1: &word = access [set] %i2.0
  store i64(0x0), %i2.1
  return
b3():
  %i3.0: &Int = access [set] %b0#0
  store void, %i3.0
  end_access %i3.0
  return
}

Notice that b3 will never be reached.

The sketchy part is trying to infer the type of the return from the IR block. You can look for the store operations but in some cases it's the 2nd last parameter and in others it seems to be the 3rd last parameter. And sometimes the void return type seems to be explicitly stored, and in other cases not.

I have something that passes the if-else test above and doesn't break any other tests but it feels very sketchy.

The main issue is that we're still generating the b3 branch which is dead code. I think @esleche's option1 may be more elegant.

@kyouko-taiga please let me know if I went way off track. You can see my WIP here. By no means good code.

Other thought was that we may be able to side-step inferring the return type based only on IR blocks by just storing the return type along with the Return struct when we insert it. Had a quick look at it and saw some text related to type-erasure and thought this may be leaking some abstraction. Thought I'd stop and ask first.

kyouko-taiga commented 1 year ago

Very nice work. Thanks for investigating this issue!

The main issue is that we're still generating the b3 branch which is dead code.

As a general rule, we can't expect a function to only have reachable blocks until we reached refined IR (i.e. after mandatory IR passes). So we have to handle unreachable blocks during most analysis phases. We can do that very easily by building a CFG and only visit blocks reachable from the root.

The sketchy part is trying to infer the type of the return from the IR block.

Why do you need that?

The function knows the type of its return value. I think you figured as much since you used self[b.function].output. Type checking already applied so we don't need to check whether the type of a return is "correct". It must be or we wouldn't be looking at IR.

Note that the return instruction never returns any value in our IR. It only transfers control flow back to the caller. The DI pass (aka NormalizeObjectStates) guarantees that the return value is set when we hit a return statement. So you shouldn't have to care about that in your pass. You only need to guarantee that all reachable execution paths are post-dominated by a return.

For example, if you remove the diagnostic in Emitter (as showed in your post) and try to compile this function:

fun foo() -> Int {
  if true { return 0 }
}

DI will diagnose "set parameter not initialized before function returns".

kyouko-taiga commented 1 year ago

I think we can get away by:

Remove the current logic that rejects the test case
Add an IR pass that ensures all execution paths from the root eventually have a return or unreachable terminator (it should probably even be a well-formedness rule of the IR)
Improve the diagnostic generated by DI to report missing statement when the uninitialized set parameter corresponds to the return value storage.

Note that we also have to deal with calls to a Never-returning function. In those situations, the return value storage won't be initialized when we hit return but we shouldn't emit a missing return diagnostic. We could do that by having the emitter produce unreachable instead of return after a call to a Never-returning function.

For example:

fun foo() -> String {
  fatal_error();
}

IR for this function should probably be:

fun foo() -> String {
b0(%b0#0 : &String):
  %i0.0: &Never = alloc_stack Never
  %i0.1: &Never = access [set] %i0.0
  call @fatal_error() to %i0.1
  end_access %i0.1
  unreachable
}

WalterSmuts commented 1 year ago

So only applying this patch:

--- a/Sources/IR/Emitter.swift
+++ b/Sources/IR/Emitter.swift
@@ -233,9 +233,7 @@ struct Emitter {
   ) -> SourceRange {
     switch emit(braceStmt: b) {
     case .next:
-      if !canonical(returnType).isVoidOrNever {
-        report(.error(missingReturn: returnType, at: .empty(atEndOf: ast[b].site)))
-      } else {
+      if canonical(returnType).isVoidOrNever {
         emitStore(value: .void, to: returnValue!, at: .empty(atEndOf: ast[b].site))
       }
       return ast[b].site

results in:

fun f2() -> Int {}

being converted to the following IR:

fun FunctionDecl(2305)() -> Int {
b0(%b0#0 : &Int):
  %i0.0: &Int = access [set] %b0#0
  store void, %i0.0
  end_access %i0.0
  return
}

So the fun f2() -> Int {} passes the type checker even though the return type in the function signature doesn't match the type being returned (no type/implicit-void). Also currently it seems like the emitter is producing implicit returns.

My observation is that your suggestion that "You only need to guarantee that all reachable execution paths are post-dominated by a return." wouldn't catch this case of an empty function block with a non-void return type. The type checker misses it and the emitter covers it up by inserting an implicit return.

I believe the logic in the Emitter that produces the implicit return types when it shouldn't may need fixing. Or perhaps the type checker needs to catch the case of an empty body? Let me know if I'm on the wrong course again please :)

kyouko-taiga commented 1 year ago

I'm surprised to see a store being emitted. Something's wrong; the generated IR is definitely ill-typed.

Type checking doesn't check that the function has a return value, so I'm not surprised to see it pass. A function body that's an empty pair of braces is parsed as an empty list of statements.

I believe the logic in the Emitter that produces the implicit return types when it shouldn't may need fixing.

Yes. We should track where/why the store void was emitted and fix that. Ideally we should have an assertion that makes sure we don't write an instance of the wrong type to %b0#0, but that's another issue. Then I think we should be able to apply the strategy I suggested.

Or perhaps the type checker needs to catch the case of an empty body? Let me know if I'm on the wrong course again please :)

Yeah but that's only one case and it should be handled by the more general strategy. So I don't think it is worth changing anything to type checking for the moment.

WalterSmuts commented 1 year ago

I'm surprised to see a store being emitted. Something's wrong; the generated IR is definitely ill-typed.

I think I may have mixed up the patch I was talking about. I had a couple floating around. Re-testing this it all seems fine. Sorry about the confusion.

Then I think we should be able to apply the strategy I suggested.

Just published a PR with the strategy you mentioned. I've omitted the step to ensure all reachable blocks are terminated with a return or unreachable instruction because in the current tests it's never being violated. I'd be happy to add it as a separate PR if you still feel it needs to be enforced, ensuring no regression occurs. I don't think it belongs in this patch set though.

kyouko-taiga commented 1 year ago

Absolutely fantastic! Thanks a lot.

I don't think it belongs in this patch set though.

Agreed. I should spend some time on documenting well-formedness properly before that.

hylo-lang / hylo

Invalid detection of missing return value #711

Proposed Solutions

Option 1: Modify `emit(braceStmt:)`, `lowerStatements()` and friends

Pros:

Cons:

Option 2: Early-Stage Transformation

Pros:

Cons:

Conclusion

hylo-lang / hylo

Invalid detection of missing return value #711

Proposed Solutions

Option 1: Modify emit(braceStmt:), lowerStatements() and friends

Pros:

Cons:

Option 2: Early-Stage Transformation

Pros:

Cons:

Conclusion

Option 1: Modify `emit(braceStmt:)`, `lowerStatements()` and friends