masak / alma

ALgoloid with MAcros -- a language with Algol-family syntax where macros take center stage
Artistic License 2.0
138 stars 15 forks source link

Implement labels and goto #509

Open masak opened 5 years ago

masak commented 5 years ago

In a module, of course. :wink: (Added to #401.)

import syntax.stmt.goto;

RETRY:
my name = prompt("Enter your name: ");

if !isValid(name) {
    goto RETRY;
}

The extent of goto is an interesting topic. I'm fine with being conservative in a first iteration, meaning never into or out of blocks, even intra-func blocks. (So the above example wouldn't work.)

Later, we can loosen it up to Perl 6's theoretical maximum: out of any number of blocks (unrolling the stack as needed) and into any number of blocks, as long as no block needs parameters. (Not sure whether parameters with defaults are OK.) The semantics described also mean that labels don't exactly have lexical scope; instead they are scoped to their surrounding function or mainline, like JavaScript's var bindings.

func foo() {
    say(BAR);
    {
        BAR:
        say([
            "Yep, that worked",
            "even though the label BAR was declared later",
            "and in a nested block!",
        ]);
    }
}

Labels can be first-class, and show up lexically as fairly shy instances of Label.

I make no value judgements as to whether goto is a net moral good, or whether it can always be written in better ways. As soon as we are far enough along with a separate runtime, this should be a macro we can write quite easily.

(However, if you try to goto out of a quasi and into the surrounding macro body, don't expect the compiler to be merciful in its error message.)

masak commented 5 years ago

It's only the goto statement itself that takes the illexical view on labels. Ordinary expression lookups can't, not without overloading the whole lookup mechanism unfairly. Also note that goto needs to find not just a label in (function) scope, but one actually attached to a statement in the function.

Currently I see little point in making labels first-class, actually. You can't really do anything with them... Print their name, I guess? goto takes a label but not an expression that evaluates to a label. If you pass a label from the function where it's rooted to one where it's not, you can print its name in the receiving function, but nothing else. Certainly not goto it unless you already could.

Forward gotos should be fine. The compiler can remember which labels it hasn't seen, and require them at function end, on pain of an error message. (This is something an orchestrated lazy stateful context macro can handle. I want to try writing one.)

masak commented 5 years ago

next and last and redo (#325) could probably be implemented in terms of goto, but thinking about it further, I'm not sure they should. Instead, both goto and those three should be using the same underlying byte code ops on the code generation layer.

masak commented 5 years ago

Drive-by comment to note that there's an interesting parallel between Symbol and Label. (As pointed out by @vendethiel++.) I don't know exactly what the parallel is, except that labels are globally unique and unforgeable just like symbols are. More precisely, codegen can produce two distinct labels (from, say, two distinct macro invocations) with the same "name" in the same scope, and they won't collide.

(Symbols on there other hand don't have the semantics of "branch target" that labels do.)

My best bet is that labels contain/has-a symbol, and expose some of its uniqueness semantics.

masak commented 5 years ago

(Not sure whether parameters with defaults are OK.)

Still not sure. But if we make them OK, we need to account for the fact that parameter defaults can contain arbitrary code.

goto THELABEL;

if whatever() -> x = foo() {
    THELABEL:
    say(x);
}

If you jump straight into the if block, does the default foo() trigger? (Probably. Damn you, goto! Like a supernatural control flow primitive.)

Now, what if foo() contains a throw? Or another goto? (And does the answer differ depending on whether that goto jumps outside of foo or not?) Why am I suddenly thinking about cans of worms?

masak commented 5 years ago

How, exactly, does goto interact with register allocation? (I'm not sure, but I think the answer is some variant of "aaaaugh!")

In general, we have not even seen the label we are jumping to when we see the goto. We'll see it sometime during parsing the rest of the compunit. Only then (but more likely at CHECK time) do we figure out the exact relation between the goto and its label, how much unwinding/frame entry we have to do (codegen) to get there, and what, if any, registers need to be shuffled around.

(Aaaaugh!)

masak commented 5 years ago

(Not sure whether parameters with defaults are OK.)

Still not sure. But if we make them OK, we need to account for the fact that parameter defaults can contain arbitrary code.

I think the ruling must be this: gotos are low-level building blocks for control flow, and executing code can not be part of a goto jump. Anything that even looks like it might run code, such as a parameter default, is ruled out, statically. (I think conservatively ruling out all parameter defaults is a good start; later, some rule can be found about literal/constant/constant-foldable expressions being OK, maybe. Dunno if we'll ever allow calls, though.)

On the other hand, stack unwinding and finally semantics are a basic human right, and goto should respect those. (Hm, I guess successfully jumping into a block implies some "stack winding"...)

masak commented 5 years ago

Ok, this SO answer is relevant/interesting.

masak commented 4 years ago

I came in here to point out that the problem with goto is that it's squeezed between a rock and a hard place: it's notoriously low-level, with its implied implementation of "set the instruction pointer", but it also needs to play nice with higher-level mechanisms such as (intra-routine) stack unwinding and finally semantics.

Respecting finally semantics is not some nice extra; is goto doesn't satisfy that one, it breaks finally semantics. When you think about it, that's the same contract continue and break ("neutered" goto statements) adhere to in a language like Java. Having to play by the rules of finally is the price goto has to pay to even exist in a language where finally also exists. Maybe the way to swallow that pill conceptually is that it's not so much goto itself that adapts to finally, it's the language semantics that guarantee, on a global scale, that even something as low-level as goto is intercepted to respect finally semantics, because the whole point of finally semantics is that it's ironclad and doesn't have loopholes.

As to goto and register allocation, that's a fun one to think about, but not a show-stopper. There's always the depressing default of spilling everything. :smile: And probably it's possible to do better than that.

masak commented 3 years ago

The semantics described also mean that labels don't exactly have lexical scope; instead they are scoped to their surrounding function or mainline, like JavaScript's var bindings.

I came here to mention in passing that, in JavaScript, labels do have lexical scope, as far as I can determine. Specifically, you can do this:

function foo() {
  label1:
  {
    console.log("hi");
    break label1;
  }

  label1:
  {
    console.log("hi");
    break label1;
  }

  label1:
  {
    console.log("hi");
    break label1;
  }
}

(You can re-use the same label many times in the same function, as long as their scopes don't lexically shadow each other. Even that would be OK, I guess, albeit unnecessarily confusing.)

Stating the obvious, though, this lexical scoping works fine as long as we're only dealing with break (and continue) (and their Perl-family counterparts next/last/redo), but not when we're dealing with goto. It's goto that has requirements on labels that push them beyond lexical scoping.

I think that settling on function-level scoping (à la JavaScript's var) would be OK, but I haven't completely ruled out that there might be a type of scoping that's narrower than that, but wider than lexical scoping. In the end maybe that turns into a matter of ROI and for whose sake further complexity is added.