idris-lang / Idris2

A purely functional programming language with first class types
https://idris-lang.org/
Other
2.52k stars 375 forks source link

RefC tail-call optimization #1778

Open madman-bob opened 3 years ago

madman-bob commented 3 years ago

The RefC backend is not performing tail-call optimization properly.

Steps to Reproduce

Consider this elaborate identity function on Nats. We're treating Nats as List (), and reversing them.

reverse : Nat -> Nat
reverse = reverseOnto 0
    where
      reverseOnto : Nat -> Nat -> Nat
      reverseOnto j 0 = j
      reverseOnto j (S k) = reverseOnto (S j) k

main : IO ()
main = printLn $ reverse $ the Nat 100000

Expected Behavior

This should trivially print 100000.

Observed Behavior

Instead, we get a Segfault, due to running out of stack space.

Attempted Diagnosis

The trampolining support code looks correct, if messy.

As reverseOnto is tail-recursive, it should not be launching another trampoline, instead returning the complete closure. If we replace the line tmp_4 = trampoline(closure_6); with tmp_4 = closure_6; in the generated code for reverseOnto, then it compiles and runs without issue.

A superficial reading of RefC.idr suggests that we do actually distinguish between tail-recursive and non-tail-recursive function calls, so something must be getting lost along the way.

stefan-hoeck commented 3 years ago

From what I can tell after looking at RefC.idr, there is no tail call analysis being done in that module, therefore I'd be very surprised if this worked out of the box. The JS backends perform proper mutual tail call optimization and handle @madman-bob 's example correctly. See Compiler.ES.TailRec.idr, which operates on the NamedCExp IR and has been thoroughly annotated with doc strings; maybe this is a useful starting point for the RefC backend as well?

dagit commented 2 years ago

I just happened to independently discover this bug in something I was exploring with idris.

According to wikipedia ANF is meant to be a form of CPS that is easier to work with during codegen. This lead to me an alternative solution that doesn't require using any of the analysis from Compiler.ES.TailRec. In CPS/ANF form we should be able to detect the tailcall case by simply looking at the term and seeing if the call in question is in tail position. Indeed, the outermost call to cStatementsFromANF does this.

So what is the problem? The issue is that when processing cases the code generator cannot tell if the current application is in tail position. I introduced a new type for tracking this information (I could have used a boolean but I think that's poor form). Something like:

data TailPositionStatus = InTailPosition | NotInTailPosition

callByPosition : TailPositionStatus -> ReturnStatement -> String
callByPosition InTailPosition rs = tailCall rs
callbyPosition NotInTailPosition rs = nonTailCall rs

This TailPositionStatus is added as a parameter to cStatementsFromANF, conBlocks, constBlockSwitch, constDefaultBlock. Some care is needed to make sure it's passed around and set appropriately. The outer most call to cStatementsFromANF passes InTailPosition and applications and lets change the status. Once that's in place, it's possible to get TCO working for refc. I have not fully tested my current implementation but the only issue I expect is if I mishandled a particular case somewhere.

I think the scheme above could be improved. I was looking at the lean4 codegen to see how they get tailcalls and they do something smart that I think idris could do as well. They differentiate between tail recursion and the more general tail call case. In the tail recursion case, instead of returning to a trampoline they overwrite the C function's arguments and branch to the start of the function body. Essentially, the trampoline is inlined for self-recursive functions. I don't have a sense for how hard that would be to implement in the current code generator, but I think it could be a very nice optimization. It would remove the overhead of the trampoline from tail recursion.

@edwinb If I made a PR, which of the changes above would you be interested in for refc? Assuming the tests pass, I could prepare a PR for callByPosition fairly quickly. The tail recursion optimization would take me longer, of course, but I'd be happy to attempt it if you're interested in it.

edwinb commented 2 years ago

@dagit This sounds generally very helpful! I'm not tremendously familiar with the RefC backend so I'd generally defer to @madman-bob, but optimising the trampoline sounds good too. The most important thing is fixing tail recursion in the first place - I never looked closely but your suggestion of spotting things in tail position sounds like an important thing to do.

dagit commented 2 years ago

Great. I'll do the simplest thing for now, just fix the TCO problem, and use that to learn the process for contributing. After that, maybe I can start in on the optimization work.