Reviving tail-call elimination

eira-fransham commented 5 years ago

It's been dead for a while, but I'd like to propose starting to think about TCE again. The discussion that I saw before was mostly centred around certain algorithms being more convenient to implement using recursion, but I'd like to give an example that's impossible to implement without TCE and that I actually genuinely do have a personal need for. To create a fast interpreter with 100% safe code, one possible implementation is like so:

pub struct Instruction(pub fn(&mut Stack, &[Instruction]) -> Result<Value, Error>);

pub fn convert(instructions: &[OpCode]) -> Vec<Instruction> {
    instructions
        .iter()
        .map(|i| match i {
            OpCode::Add => Instruction(instructions::add),
            // ...
        })
        .collect()
}

mod instructions {
    use super::{Error, Instruction, Stack, Value};

    pub fn add(stack: &mut Stack, next: &[Instruction]) -> Result<Value, Error> {
        let (a, b) = (
            stack.pop().ok_or(Error::new())?,
            stack.pop().ok_or(Error::new())?,
        );
        let out = a.checked_add(b).ok_or(Error::new())?;
        if let Some((next, rest)) = next.split_first() {
            stack.push(out);
            return (next.0)(stack, rest);
        } else {
            Ok(out)
        }
    }

    // ...
}

This can compile to excellent assembly, with similar performance to a naive compiler modulo the worse instruction cache usage and jump for each instruction, but you can see in this toy implementation that it compiles to call+ret using current Rust. With proper TCE error handling should just be the cost of creating the error plus a ret. Unfortunately, it is impossible to express this in Rust right now, since TCE is not guaranteed and there's no way to assume that it will happen in debug mode. Another issue is that recursive fn definitions aren't allowed, but that doesn't prevent this from being written, it just prevents it from being safe.

glaebhoerl commented 5 years ago

This is besides the main point, but is there some reason this is not a suitable alternative to recursive typedefs?

struct Instruction {
    fun: fn(&mut Stack, &[Instruction])
}

eira-fransham commented 5 years ago

This is besides the main point, but is there some reason this is not a suitable alternative to recursive typedefs?
struct Instruction {
    fun: fn(&mut Stack, &[Instruction])
}

Yes, that's a great solution that I hadn't thought of

EDIT: Updated original post to use this method

eira-fransham commented 5 years ago

It's worth noting that LLVM doesn't even emit tail calls for noreturn functions - it emits call + ud2 https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=2ccde35f7cc96744cb46df6b11ff13c9

EDIT: Now that I think about it, this and the original example are probably both explicitly prevented by Rust for indirect function calls in order to preserve panic safety. Maybe Rust should only prevent TCE in the case that the caller has drop code to run.

eira-fransham commented 5 years ago

The reason LLVM can't emit tail calls for noreturn functions is that enabling TrapUnreachable means that they're no longer tail calls at all https://github.com/rust-lang/rust/pull/45920

clarfonthey commented 5 years ago

Would recommend explaining that TCE means tail call elimination in your post for those of us who had to read the thread to figure that out.

eira-fransham commented 5 years ago

Would recommend explaining that TCE means tail call elimination in your post for those of us who had to read the thread to figure that out.

Good point, I've fixed the title

travis-leith commented 4 years ago

I am just here to add a small voice in support of TCE. Seems like this topic has not had any love in over a year :(

kennethuil commented 4 years ago

Is there a reason we shouldn't require caller and callee signatures to match? That would allow the use of musttail.

le-jzr commented 4 years ago

Is there a reason we shouldn't require caller and callee signatures to match? That would allow the use of musttail.

Mutual recursion? Requiring matching signatures is sort of acceptable as a stop-gap restriction assuming that the elimination requires the explicit use of a dedicated keyword. However, if it would be implicit, you'd get the nasty surprise of mutually recursive functions sometimes working and sometimes not, with no indication from the compiler. That would be pretty bad.

matu3ba commented 4 years ago

@Vurich See #1888 for status. Would be nice, if you can close the issue after the thread is clarifying for you.

human-0 commented 3 years ago

See #1888 for status. Would be nice, if you can close the issue after the thread is clarifying for you.

Now that those issues appear to be resolved, could it be time to start talking about TCE again?

programmerjake commented 3 years ago

Clang just merged support for guaranteed tail-calls: https://reviews.llvm.org/rG834467590842

matklad commented 3 years ago

Can someone prepare a summary comment for the previous discussion? The most recent complete summary I think is https://github.com/rust-lang/rfcs/pull/1888#issuecomment-309410809. There’s also this incomplete summary: https://github.com/rust-lang/rfcs/pull/1888#issuecomment-360905777

ArniDagur commented 3 years ago

This article about Clang's guaranteed tail-calls has been making the rounds on the internet https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html, it makes the case for why one would want such a feature in a programming language.

matu3ba commented 3 years ago

The title is very unfortunate, as it only shows tail call optimization(TCO) for the complicate case of primitive recursive functions (primitive means only recursive structure is the function calling itself). SIdenode: Generalized TCO (functions that are not primitive recursive and introduce multiple recursion levels) is already unfeasible to solve.

Note, that functional languages are often cheating here and put what they see as "the call stack" on the heap (which is of course significant slower due to the heap lookup (and this can even include additional checks).

The basic rule of TCO is the caller must not touch the stack of the callee directly or indirectly via pointers and references.

For simple non-recursive calls (LLVM can check, if a function call chain can be recursive and error out): This is very much possible to do with pointer escape analysis, since one only must prove that that the pointers of caller and callee do not alias (and there is no indirection via other functions involved).
For primitive recursive function calls of extreme simple stuff this is implemented here in LLVM. As you can see LLVM can not ensure that side effects can take place in other function calls due to pointer being difficult (see pointer escape analysis and the solution called shape analysis aka in Rust lifetime analysis. The underlying reasoning is explained in point 3.
If one tries to feed LLVM pointer structures with primitive recursive structures, it just gives up even for the most basic structure. This is, because LLVM must guarantee that any recursion step leads to the caller not accessing the callee stack frame via pointers or any indirection (which has only the generalized solution of shapeness analysis). However, on top of the shape of the memory structure LLVM would require to know the change of possible pointer/reference access regions for all possible runs. To my knowledge, this requires a more sophisticated static analysis or full blown invariant annotation that is combined with the lifetime data (even for these simple cases).

So my advice would be to provide simple tail calls and forbid it for anything recursive, since even in the case of case 3 being less complex and expressible/derivable with lifetimes from the trait system LLVM has rather primitive to non-existing shape analysis in contrast to Rust.

Sidenode: Terminology is poor, since TCO does not differentiate between recursive and non-recursive calls.

OvermindDL1 commented 3 years ago

Note, that functional languages are often cheating here and put what they see as "the call stack" on the heap (which is of course significant slower due to the heap lookup (and this can even include additional checks).

The stack is also on the heap, there is no difference in access speed other than the stack just being so well used that it stays resident on the CPU, which is the case if you use some other part of memory as the stack as well. The only 'cost' is needing to pop an instruction frame off your stack to the SP if you want to call return, but if you have full TCO with full CPS transformations then that's rarely if ever done anyway (and that cost of copying it over is a single in-cache copy, it's not slow).

But yes, I would love full proper TCO, even if via a become keyword.

kennethuil commented 3 years ago

The title is very unfortunate, as it only shows tail call optimization(TCO) for the complicate case of primitive recursive functions (primitive means only recursive structure is the function calling itself). SIdenode: Generalized TCO (functions that are not primitive recursive and introduce multiple recursion levels) is already unfeasible to solve.

Checking that functions with explicit tail calls are memory-safe would be feasible, though, whether or not they're recursive.

If one tries to feed LLVM pointer structures with primitive recursive structures, it just gives up even for the most basic structure. This is, because LLVM must guarantee that any recursion step leads to the caller not accessing the callee stack frame via pointers or any indirection (which has only the generalized solution of shapeness analysis).

I don't know Zig, but it looks like that issue is not related to the recursion. Passing a pointer to your own stack into a tail call would cause unwanted behavior whether or not the tail call is recursive. In Rust, passing a slice reference (not a reference to a slice reference) to data referenced by an incoming argument should work just fine.

So my advice would be to provide simple tail calls and forbid it for anything recursive, since even in the case of case 3 being less complex and expressible/derivable with lifetimes from the trait system LLVM has rather primitive to non-existing shape analysis in contrast to Rust.

If there's no recursion, stack size is always bounded and there is no need for guaranteed TCO in the first place.

Given an explicit tail call, the borrow checker would need to give stack objects ~~(not incoming arguments)~~ a lifetime that ends before the tail call, not after, except that it would also need to let you move objects into the tail call. And drop them or force you to explicitly drop them before the tail call unless you move them into the tail call. Codegen would have to ensure that moves/copies into tail call arguments are actually represented as moves/copies. This would ensure that the tail callee never receives any direct or indirect references into the caller's stack frame, and "musttail" can be safely applied, bypassing LLVM escape analysis.

kennethuil commented 3 years ago

(Or better yet, if there is/can be a notion of a lifetime ending during a function call, that would fit nicely here)

Or the borrow checker could treat become as equivalent to "return(tuple of tailcall arguments)"

Kimundi commented 3 years ago

Yeah, treating the become expression having a lifetime scope outside of the function body seems like the simplest approach for this, and can also be easily explained by analogy to the return case. Semantically, it could probably just be defined as if we returned and then immediately called the next function.

eira-fransham commented 2 years ago

Yeah it makes semantic sense to model it as a return followed by a call to the next function, you could consider it semantically like modelling a call to a tail-recursive function as having similar semantics to a loop that uses alloca (although of course Rust doesn’t have alloca)

louy2 commented 2 years ago

There is now a proposal for C23 on Tail-call elimination.

raviqqe commented 2 years ago

In my understanding, this proposal is only about the Rust calling convention, totally internal, and doesn't affect interfaces with other languages. However, that would be great if we could provide TCE with other calling conventions like C and WASM for ABI compatibility with the clang's musttail attribute, the C's standard TCE attribute, or WASM return_call and return_call_indirect.

I'm currently working on a programming language project. And it uses trampoline functions for now to circumvent the lack of TCE in Rust.

DemiMarie commented 2 years ago

My proposal (#1888) solved the escape analysis problem using the borrow checker. Specifically, it desugared

fn x() {
    let a = Box::new(());
    let b = Box::new(());
    become y(a)
}

as

fn x() {
    let a = Box::new(());
    let b = Box::new(());
    let _tmp = a;
    drop(b);
    become y(_tmp)
}

This avoids all of these problems: the lifetimes of local variables end before the actual tail-call, so any attempt to pass a reference to them is a lifetime error and will be caught by borrowchk. No escape analysis is necessary or even desirable.

Robbepop commented 1 year ago

The Wasm tail-calls proposal has just been voted to move to Phase 4 (WG Standardization) in the CG meeting today. (https://github.com/WebAssembly/proposals/pull/157) Since Wasm is one of the most important targets for Rust I thought it was cool to mention it here. :)

phi-go commented 1 year ago

I appreciate the discussions here. What would be the next step to move this issue forward?

Robbepop commented 1 year ago

Given the long history of this feature request my feeling is that the next step is to find a few people: namely, people from the lang and compiler team to act as mentors and support another person in their attempt to actually implement an initial support in rustc so that the community can start toying around with it and gaining more experience with the pros and cons of its design. Hash out remaining open questions along the way. We probably do not need a perfect solution to start this process.

phi-go commented 1 year ago

Sounds like a plan. As I'm new to rust compiler development, do you know a place to find mentors? Is it on Zulip?

I actually already know the issue you linked, thank you for the summary! I'm also interested in this feature to improve emulator performance.

Lokathor commented 1 year ago

The zulip is absolutely a good place to start.

scottmcm commented 1 year ago

The big open question to me is this: Can we make this the backend's problem, or does it need to be rust's problem?

Said otherwise, is it a reasonable requirement to say that every possible backend must support this? Historically WASM didn't, so the answer was no. If WASM tail call support is landing, then it might be acceptable, but it's still evidence that there might well be desirable targets that don't support it. Certainly big complicated optimizing backends (like LLVM and GCC) will support it, but it's harder to know for sure for other ones. Suppose someone wrote a backend that generates C89 to use with a proprietary compiler that doesn't support TCO. Would we be ok saying that such a backend "isn't Rust"? (I arbitrarily checked MSIL/CIL, which appears to have tail.call, tail.calli, and tail.callvirt, per I.12.4.1.2, so I guess that one wouldn't be a problem, but non-existence proofs are hard.) And if it can't be the backend's problem, then there needs to be a transform of some kind in Rust, with maybe some special MIR functionality to enable it. And in either case, one needs to look carefully at what the restrictions on tail calls would be such that the compiler could guarantee it. (Maybe that means an extern "Rust-Tail", maybe that's an inside-the-module restriction, maybe …)

I'm not sure how best to resolve it, but it has huge implications on most of the rest of the feature.

That said, there's also some relatively more straight-forward work that could be done regardless of the core implementation strategy. For example, one could start implementing become as unstable (which I think needs a compiler MCP to ensure T-compiler is ok with that code existing) to do the scope-affecting transformation for it, and thus be able to try out writing code in that style since LLVM will almost certainly TCO such IR, even if it's still not a guarantee yet. And that plus a relatively-simple MIR transform to a loop could potentially let guaranteed tail recursion happen, which is a nice first step even if it's not nearly as exciting as full generalized tail call optimization.

Or, I suppose, there's the other alternative of making some new language thing to enable a principled form of computed local goto, rather than doing these situations through TCO. (https://internals.rust-lang.org/t/pre-rfc-safe-goto-with-value/14470/9?u=scottmcm.) It's possible that expressing interpreters/emulators that way might be more natural than TCO anyway, if the goal is to jump between a constrained set of locally-defined things, rather than tailing out to arbitrary other things -- I'm not sure.

rpjohnst commented 1 year ago

Another possibility might be to make it the backend's problem, but also make sure it is always possible to implement using trampolines as a fallback- that might be acceptable if it only needs to be used on sufficiently-niche backends, such as part of bootstrapping.

I can imagine a few different ways of going about that, by limiting the feature in various ways- extern "Rust-Tail", heavier restrictions on the data used in a become expression (relative to cross-crate tail calls), auto-generated trampoline wrappers, etc. This is the sort of thing that would probably best be investigated with a working implementation on nightly, to shake out the actual implementation constraints.

phi-go commented 1 year ago

Thank you all for your comments. Let me try to "summarize" the next things that can be worked on (this turned out to be quite long). Please correct me if I'm misunderstanding or missing something.

First off, regarding the implementation, I spent some more time to read through the discussions on #1888. It seems to me that there is a consensus that using the become keyword (and the suggested scope transformation) is the sanest way to implement guaranteed tail calls of all discussed options. That said, as mentioned by @scottmcm, the big unknown is how to best implement this feature. One technique to reduce complexity is the trampoline as a fallback (also mentioned here by @rpjohnst). Other than this core issue, there are some questions regarding teaching and using of become, which can initially be investigated separately without the need for a functioning implementation.

All in all it seems that a new attempt at a implementation would be the correct next step, even if the result is only investigative. Note that there is a implementation that is now ~6 years old (GitHub - DemiMarie/rust at explicit-tailcalls) and a more recent implementation for the frontend (https://github.com/rust-lang/rfcs/pull/1888#issuecomment-1186593662) which might be good enough to guide the implementation, especially with the support of a mentor.

As someone not used to the compiler development process, judging from the contribution guidelines we still need a RFC under which we can put / discuss the implementation, if this work should be integrated with Rust. As the pull request #1888 has only been postponed, I think another important step would be to revive that RFC.

Note that the RFC was postponed because it did not fit with the goals of 2018 (comment) and while there was further discussion and notably developments in LLVM and WebAssembly since then it has not been reopened. There actually was a comment to ask for the RFC to be reopened which seems to have gone under the radar. Which leads to the question, what is the correct way to ask for a RFC to be reopened?

Naturally, the RFC would have to be updated with the discussed points and a consensus on the different issues would need to be found, which hopefully can be done with the support of an implementation.

So, coming to the actual list of TODOs:

[ ] Implementation
- [ ] Try to find a mentor(s) that would be willing to support.
- [ ] Either port the existing implementation or write a new implementation.
- [ ] Work on Rust code examples that use become to learn more how this feature would be used and how it can be taught.
[ ] RFC
- [ ] Reopen #1888?
- [ ] What needs to happen for the RFC to be reopened?
- [ ] How should this be communicated?
- [ ] Update the RFC.
- [ ] Summarize previous discussion.
- [ ] Try to find a consensus for the different issues.
- [ ] (If #1888 can not be reopened for whatever reason, what is the alternative?)

phi-go commented 1 year ago

Just wanted to let you know that I'm still working on this. My plan is to create a (hopefully) comprehensive summary as the next step, though this is taking some time. (Also reading through the rustc dev guide currently).

Amanieu commented 1 year ago

The main issue with general tail calls is that they require complex stack shuffling when the calling conventions don't match exactly. Consider this case:

fn foo(arg: [u8; 1024]);

fn bar() {
    // The calling convention requires this to be passed indirectly by pointer.
    // But there is nowhere on the stack for the allocation to live!
    become foo([5; 1024]);
}

A more viable solution would be to only support restricted tail calls where the callee is required to have the exact same signature as the caller. The problem then becomes much simpler: we just have to overwrite our own arguments with new values and then transfer control to the target function. This is what Clang's musttail attribute requires.

However there's still the issue of backend and target support:

GCC and Cranelift don't support an equivalent to musttail, though Cranelift is working towards it.
LLVM supports a musttail attribute (used by Clang), but it is somewhat buggy: x86 seems to work, RISC-V & AArch64 miscompile, and ARM & PowerPC fail to compile

With that said, I do think these are solvable problems and I am happy to help as a mentor for this feature. The first step should be to open a new RFC, partially based on #1888, which proposes a new plan for supporting restricted tail calls (with the same-signature restriction).

DemiMarie commented 1 year ago

A more viable solution would be to only support restricted tail calls where the callee is required to have the exact same signature as the caller. The problem then becomes much simpler: we just have to overwrite our own arguments with new values and then transfer control to the target function. This is what Clang's musttail attribute requires.

This is not enough for e.g. general functional programming, though. LLVM has full support for general tail calls and GHC and HiPE both rely on it. The difference is that those rely on a different calling convention that makes this easy.

Even with the standard calling convention, it is still possible to implement fully general tail calls via a trampoline.

phi-go commented 1 year ago

Thank you for your input! I would be happy to prepare a new RFC as I have been reading related threads anyway.

As already mentioned by @DemiMarie, regarding the exact same signature, my current understanding is also that this is still not quite enough, for example LLVM requires a specific calling convention (fastcc) which differs from the default calling convention normally used. One option that has been discussed is to change the default calling convention, which would have a wider impact (though it might be feasible https://github.com/rust-lang/rfcs/pull/1888#issuecomment-283826169). The alternative is to mark functions as tail-callable to limit the impact of an alternative calling convention.

Amanieu commented 1 year ago

Even with the standard calling convention, it is still possible to implement fully general tail calls via a trampoline.

This would add significant overhead to all function calls, unless functions that perform tail calls are specifically marked in their signature. A trampoline solution basically involves a function returning the address of the next function to execute, and the caller having a loop to keep calling the next function.

As already mentioned by @DemiMarie, regarding the exact same signature, my current understanding is also that this is still not quite enough, for example LLVM requires a specific calling convention (fastcc) which differs from the default calling convention normally used.

A specific calling convention is only required for general tail calls. When the signatures match any calling convention can be used.

Robbepop commented 1 year ago

I think it is a very good idea of @Amanieu to start implementing this feature as an MVP and the restriction to only tail call functions with the same signatures to make the initial implementation simpler and more feasible with the end goal of having general tail calls after some more iterations. This quickly gives us something to toy around with and be able to experiment with extensions.

It is also how new features in Rust usually are developed and matured.

phi-go commented 1 year ago

Ah, fair enough. So if I understand correctly this should also apply to indirect function calls / calls to functions that are not known during compile time, as only the last function would need to clean up the stack. This approach is probably already quite flexible as one can just provide more context in the function arguments, though this might come at some performance overhead. For example, a state machine that requires different context in different states.

Diggsey commented 1 year ago

This would add significant overhead to all function calls, unless functions that perform tail calls are specifically marked in their signature. A trampoline solution basically involves a function returning the address of the next function to execute, and the caller having a loop to keep calling the next function.

I think you could probably do something like this:

Functions which use become automatically use a "rusttail" calling convention by default.
become is a compile error if used on functions which are not "rusttail".
"rusttail" functions can also just be called normally without become.

Since tail-calls are only really useful in mutually recursive situations, then it shouldn't be an issue, and it doesn't impose a cost on other code.

eg. if you have functions A <-> B which both call each other, then to get the benefit of a tail call you need both A and B to use become.

DemiMarie commented 1 year ago

Since tail-calls are only really useful in mutually recursive situations, then it shouldn't be an issue, and it doesn't impose a cost on other code.

Tail-calls are also useful for continuation-passing style, where one will be tail-calling a trait object.

Diggsey commented 1 year ago

Tail-calls are also useful for continuation-passing style, where one will be tail-calling a trait object.

I think any workable solution would require some annotation on the trait in that case - in this example you would have to give the trait method a "rustcall" convention.

DemiMarie commented 1 year ago

Tail-calls are also useful for continuation-passing style, where one will be tail-calling a trait object.

I think any workable solution would require some annotation on the trait in that case - in this example you would have to give the trait method a "rustcall" convention.

The other approach would be to modify LLVM to support fully general tail calls in all cases, using a trampoline if necessary. That’s out of scope for the MVP, but is necessary for the general case.

phi-go commented 1 year ago

In case someone is interested in the current state of the RFC, or wants to give some early feedback, it can be found here https://github.com/phi-go/rfcs/blob/guaranteed-tco/text/0000-guaranteed-tco.md. I hope to finish sometime next week.

Robbepop commented 1 year ago

In case someone is interested in the current state of the RFC, or wants to give some early feedback, it can be found here https://github.com/phi-go/rfcs/blob/guaranteed-tco/text/0000-guaranteed-tco.md. I hope to finish sometime next week.

It would be great to have an issue that tracks this new RFC so that people can comment on it in a thread dedicated to the RFC. Usually those issues have links to the rendered markdown of the RFC in question. Or is the plan to recycle the old RFC tracking issue?

phi-go commented 1 year ago

RFCs are discussed in the pull request from what I can tell but I wanted to first finish an initial version before opening a PR.

phi-go commented 1 year ago

Alright, I finished up the RFC, you can find the PR here #3407. First time writing an RFC so I hope it's fine.

Robbepop commented 1 year ago

Might be of interest: Blogpost about the new Wasm tail calls in the V8 engine. https://v8.dev/blog/wasm-tail-call

programmerjake commented 3 weeks ago

somewhat related: idea for safe computed goto by using a new enum repr: https://internals.rust-lang.org/t/idea-for-safe-computed-goto-using-enums/21787?u=programmerjake

rust-lang / rfcs

Reviving tail-call elimination #2691