Expose control flow as an API

masak commented 8 years ago

I would like to be able to provide next, last, and redo in loops as statement macros. But I would also like the underlying mechanism that they lean upon to be exposed as a clean API.

What would the API look like? Something like this:

for and while loops would register their loop Q::Block in some way with the tag loop.
In a statement macro, control flow would somehow — handwave — be exposed to the user. Let's say the syntax is ctrlflow.goto(location).
The implementation of next would then be if loopcondition { ctrlflow.goto(findtag("loop").beginning); } else { ctrlflow.goto(findtag("loop").end); }
The implementation of last would be ctrlflow.goto(findtag("loop").end);
The implementation of redo would be ctrlflow.goto(findtag("loop").beginning);

I think we don't need to make this mechanism dynamic (like in Perl 6) — in fact, it's easier to statically check if we don't. Which also means that .map won't be considered a loop in the above sense. On the other hand, if someone wanted to write a custom loop, they'd simply provide it with the tag loop, and it'd play along.

masak commented 8 years ago

The implementation of next would then be if loopcondition { ctrlflow.goto(findtag("loop").beginning); } else { ctrlflow.goto(findtag("loop").end); }

I'm subtly and dangerously wrong above.

A next is not a simple goto. Both for and while loop blocks can accept parameters, and these need to be bound to their new values. The simplest way to do that, of course, is to cleanly exit the current iteration, and to call the block again with the new parameters.

It may seem a bit silly to provide .next as a method on ctrlflow, but that might be the easiest way to write the implementation of next in a sensible way:

ctrlflow.next();

Doing it this way would require slightly more than a loop tag to be attached to loop blocks; they'd also need a "next handler".

We could rewrite the runtime code so that it calls into the same handler during normal loop control flow. In the fullness of time, we might even be able to provide the loops themselves as statement macros, using the same API from with 007.

masak commented 8 years ago

Note that, even if we make the feature totally static and lexical, we will still have interaction between next et al. and the call stack:

for [1, 2, 3] -> e {
    sub foo() {
        if e == 2 {
            next;    # needs to unwind the call stack to the loop block
        }
    }
    foo();
    say(e);    # 1\n3\n
}

And, since the call stack is dynamic and subs are first-class, this could also happen:

my fn;
for [1, 2, 3] -> e {
    fn = sub () {
        next;    # lexically inside the loop block, but not dynamically
    };
}
fn();        # runtime error: attempt to `next` a completed loop

masak commented 8 years ago

Coming back to this issue and re-reading it, I say we might consider going the Python route and forbidding next et al. if there's a layer of sub between the loop and the next.

Yes, that's a sad cop-out, I know. But (a) since Python does this, it's within 007's rights to choose that route too, and (b) having the conservative version of the feature early is more useful than not having the ambitious version of the feature for a long time, and (c) seeing the feature implemented in a more static way might make us realize things we otherwise would not.

masak commented 8 years ago

This issue is bounded above in an interesting way by #166 and similar. When we compile to a different backend, we can no longer do very exotic things with the control flow. And since we want to have other backends, 007 also needs to be non-exotic enough so that code can be generated that corresponds to everything the control flow API can do.

masak commented 7 years ago

When we compile to a different backend, we can no longer do very exotic things with the control flow.

But (as I come back to this later and re-read it) this should be perfectly fine considering what we want to do. It simply means that all the control flow we want to be able to generate needs to be translatable down into conditionals and loops, possibly on gensym'd boolean variables. My intuition tells me this is true of next, last, and redo. It would also be true of something like #13.

Note, however, that it's not obviously true of the next-and-call-stack problems outlined above. In fact, the dictum that things be directly translatable down to local conditionals and loops may well replace "Python doesn't do it" as the prime reason for 007 to make control flow entirely static.

masak commented 7 years ago

Been thinking lately about how return interacts with catch and #156, and how that needs to be exposed in some sensible way. Essentially, return doesn't really mean "return immediately" in the cases where some surrounding cleanup mechanism has been installed; rather, it means "hand things over to the nearest cleanup handler".

I have a feeling this could be expressed/designed very neatly in terms of basic blocks. Just need to find the time to sit down and do so.

masak commented 7 years ago

Quite a bit of overlap here with #252 which I just filed.

masak commented 6 years ago

When I re-read this issue, I feel what's missing from the OP is that control flow is an aspect of code generation. (That's also why I just closed #252.)

Getting down to brass tacks instead of trying to explain in an abstract manner, here's what I imagine a code generation API might look like:

`if` statements

cg.jump_unless(condition_expr, "else")
cg.gen(then_body);
cg.label("else");
if else_q ~~ Q::Statement::If {
    # recursively generate an if statement here
}
else if else_q ~~ Q::Block {
    cg.gen(else_q);
}

`while` loops

cg.label("next");
cg.jump_unless(condition_expr, "last");
cg.label("redo");
cg.register_loop(WHILE_LOOP, "next", "redo", "last", func() {
    cg.gen(loop_body);
});
cg.jump("next");
cg.label("last");

`try` statements

cg.reroute_all_exits("finally", sub() {
    cg.catch_exceptions("catch", func() {    # *
        cg.gen(try_body);
    });
});
cg.gen(catch_blocks);
cg.label("finally");
cg.gen(finally_body);

Marked with an asterisk because the design is not all there. For one thing, the finally block will need to be informed of whether a return happened inside of the try block, so that it can resume that return (or override it) as appropriate.

`with` statements (as in #156)

my closable_ref = cg.gen(with_expression);
cg.reroute_all_exits("close", func() {
    cg.gen(with_body);
});
cg.label("close");
cg.gen(closing_code(closable_ref));

`amb` macro (as in #13)

(Would just re-write to a for loop, which would just be sugar for a while loop. An assert would code-gen to a conditional next with the registered for loop. The first parameter of register_loop indicates what type of loop it is. We could introduce a unique symbol AMB_LOOP, and have the assert code generation target it.)

Scattered notes about the code generation

The labels have human-readable names when we declare them, but on the generated bytecode end they will of course only be memory addresses.
Furthermore, the label names are local, in the sense that even when, say, many loops are nested in each other, all of them can locally call their next label "next" and there'll be no collision.
The cg object makes sure that for each local bit of code, all labels are the target of at least one jump, and all jumps have a declared label. It's not exactly Structured Programming, but it's, um, type-safe assembly. No sense in not making life comfortable for ourselves.
The above is mainly meant for the innards of a 007 compiler. But as an added bonus, macro authors could use cg; inside their module, and just make cg calls like we've done above. The compiler could intercept them and code-generate accordingly.

masak commented 6 years ago

Note that even the conservatism suggested in https://github.com/masak/007/issues/145#issuecomment-232278144 is not going to suffice. Why? Because blocks with parameters code-generate as separate "units" of callable code in the general case, because that's the only way closures inside of them can work.

I didn't realize this until I started thinking about bytecode and trying to transcribe our example programs into bytecode.

I'm still mulling on it all, but I think this pushes us towards treating next et al. less like a goto and more like a control exception.

We're still free to lock it down into being syntactically lexical, but (again, from what I can see so far) there would be no simplification in semantics in the general case from doing so.

masak commented 6 years ago

Just driving by to point out that Symbol (#312) would be a really good fit for constructing labels at the bytecode level.

masak commented 5 years ago

The above is mainly meant for the innards of a 007 compiler. But as an added bonus, macro authors could use cg; inside their module, and just make cg calls like we've done above. The compiler could intercept them and code-generate accordingly.

I was reminded of this phrase ("The compiler could intercept them") as I read Dan Abramov's React as a UI Runtime, especially the section about Inversion of Control. To wit,

The direct way to render components would be for the user to call their functions directly.
Instead, the call happens as JSX, which desugars to a React.createElement call, which is controlled by React so that React can defer and otherwise strategically call the component when appropriate.
This extra IoC layer in-between is also what allows functional components to be stateful; each new function call can refer back into the IoC layer to persist state. (Cf. hooks.)
Conversely, the IoC layer is always involved in calling the component functions, and so it has knowledge of the component hierarchy, not just its output.

It seems to me we want the same sort of layer for cg calls; that is, just like React tooling enjoys knowing about the component hierarchy, 007 tooling enjoys knowing about the implicit "flow chart" generated by different Qnodes. Just as the React renderer reserves the right to lazily evaluate components, so the 007 code generator sometimes chooses to code-generate some bits but not others.

masak commented 3 years ago

After several years of waiting for the other shoe to drop, I now think that the control API should be phrased in terms of continuations. Hoping to be able to flesh that out in a concrete way soon, but basically:

Something like an if statement receives its continuation Done, and then constructs a control flow Start -> Then -> Done and Start -> Else -> Done; the code in the Then and Else parts can be opaque to the if statement during code generation.
Similarly, a for loop or a while loop concerns itself with code-generating the loop-controlling parts, and has an opaque body that it can code-generate for. The twist, though, is that it sends its End continuation down into a context surrounding the body so that a last; statement inside the body can invoke that continuation. Similarly for next and redo.

Again, this is but a rough sketch of the idea. Continuations will generally be compiled away into weaker constructs when at all possible. Either we rule out the remaining cases (which was basically the idea with forbidding loop jumps through function boundaries, Python-style), or we accept that continuations do exist sometimes, and basically base our runtime architecture on them. I'm not in a rush to do the latter, but I also think it wouldn't be so bad. Basically Alma could run on a CESK machine.

masak commented 2 years ago

Maybe I'm not really contradicting 2021-me when I say that it feels like we're essentially talking about algebraic effects here. Maybe the connection is this: if you have continuations and a CESK machine, then algebraic effects can be phrased in terms of having a bit more access to the machine itself, especially the K part.

masak / alma