Open madman-bob opened 3 years ago
From what I can tell after looking at RefC.idr
, there is no tail call analysis being done in that module, therefore I'd be very surprised if this worked out of the box. The JS backends perform proper mutual tail call optimization and handle @madman-bob 's example correctly. See Compiler.ES.TailRec.idr
, which operates on the NamedCExp
IR and has been thoroughly annotated with doc strings; maybe this is a useful starting point for the RefC backend as well?
I just happened to independently discover this bug in something I was exploring with idris.
According to wikipedia ANF is meant to be a form of CPS that is easier to work with during codegen. This lead to me an alternative solution that doesn't require using any of the analysis from Compiler.ES.TailRec
. In CPS/ANF form we should be able to detect the tailcall case by simply looking at the term and seeing if the call in question is in tail position. Indeed, the outermost call to cStatementsFromANF
does this.
So what is the problem? The issue is that when processing cases the code generator cannot tell if the current application is in tail position. I introduced a new type for tracking this information (I could have used a boolean but I think that's poor form). Something like:
data TailPositionStatus = InTailPosition | NotInTailPosition
callByPosition : TailPositionStatus -> ReturnStatement -> String
callByPosition InTailPosition rs = tailCall rs
callbyPosition NotInTailPosition rs = nonTailCall rs
This TailPositionStatus
is added as a parameter to cStatementsFromANF
, conBlocks
, constBlockSwitch
, constDefaultBlock
. Some care is needed to make sure it's passed around and set appropriately. The outer most call to cStatementsFromANF
passes InTailPosition
and applications and lets change the status. Once that's in place, it's possible to get TCO working for refc. I have not fully tested my current implementation but the only issue I expect is if I mishandled a particular case somewhere.
I think the scheme above could be improved. I was looking at the lean4 codegen to see how they get tailcalls and they do something smart that I think idris could do as well. They differentiate between tail recursion and the more general tail call case. In the tail recursion case, instead of returning to a trampoline they overwrite the C function's arguments and branch to the start of the function body. Essentially, the trampoline is inlined for self-recursive functions. I don't have a sense for how hard that would be to implement in the current code generator, but I think it could be a very nice optimization. It would remove the overhead of the trampoline from tail recursion.
@edwinb If I made a PR, which of the changes above would you be interested in for refc? Assuming the tests pass, I could prepare a PR for callByPosition
fairly quickly. The tail recursion optimization would take me longer, of course, but I'd be happy to attempt it if you're interested in it.
@dagit This sounds generally very helpful! I'm not tremendously familiar with the RefC backend so I'd generally defer to @madman-bob, but optimising the trampoline sounds good too. The most important thing is fixing tail recursion in the first place - I never looked closely but your suggestion of spotting things in tail position sounds like an important thing to do.
Great. I'll do the simplest thing for now, just fix the TCO problem, and use that to learn the process for contributing. After that, maybe I can start in on the optimization work.
The RefC backend is not performing tail-call optimization properly.
Steps to Reproduce
Consider this elaborate identity function on
Nat
s. We're treatingNat
s asList ()
, and reversing them.Expected Behavior
This should trivially print
100000
.Observed Behavior
Instead, we get a Segfault, due to running out of stack space.
Attempted Diagnosis
The trampolining support code looks correct, if messy.
As
reverseOnto
is tail-recursive, it should not be launching another trampoline, instead returning the complete closure. If we replace the linetmp_4 = trampoline(closure_6);
withtmp_4 = closure_6;
in the generated code forreverseOnto
, then it compiles and runs without issue.A superficial reading of
RefC.idr
suggests that we do actually distinguish between tail-recursive and non-tail-recursive function calls, so something must be getting lost along the way.