masak / alma

ALgoloid with MAcros -- a language with Algol-family syntax where macros take center stage
Artistic License 2.0
139 stars 15 forks source link

Implement quasi unquotes #30

Open masak opened 9 years ago

masak commented 9 years ago

Hacker News wants unquotes. We happily oblige.

Unquotes in expressions

Whenever the parser is ready to parse a term, it should also expect an unquote.

quasi { say("Mr Bond!") }
quasi { say({{{greeting_ast}}}) }

Technically, I don't see why we shouldn't expect the same for operators. But we get into the interesting issue of what syntactic category it is.

Screw it, I'm tired of theorizing. Let's just steal the colon for this.

quasi { 2 + 2 }
quasi { 2 {{{infix: my_op}}} 2 }

quasi { -14 }
quasi { {{{prefix: my_op}}}14 }

quasi { array_potter[5] }
quasi { array_potter{{{postfix: my_op}}} }

Backporting this solution to terms, you could mark up a quasi as term if you want, but it's the default so you don't have to:

quasi { say("Mr Bond!") }
quasi { say({{{term: greeting_ast}}}) }

At the time of evaluating the quasi (usually macro application time), we'll have the type of the unquoted Qtree. The runtime dies if you try to stick a square Qtree into a round unquote.

But the parser can sometimes reject things early on, too. For example, this shouldn't even parse:

quasi { sub {{{prefix: op}}}(n) { } }

(That slot doesn't hold an operator, it holds an identifier.)

Unquotes for identifiers

007 currently has 5 major places in the grammar where it expects an identifier:

The traits one is kind of uninteresting right now, because we have four trait types. Someone who really wanted to play around with dynamic traits could write a case expression over those four. So let's skip traits — might reconsider this if we user-expose the traits more.

The three declaration cases are the really interesting ones. Notice that each of those has a side effect: introducing whatever name you give it into the surrounding lexical scope. (Handling that correctly is likely part of the #5 thing with Qtree strengthening.)

I would be fine with the {{{identifier: id}}} unquote accepting both Q::Identifier nodes and Str values. Q::Identifier is basically the only node type where I think this automatic coercion would make sense.

Unquotes in other places

These are the remaining things I can think of where an unquote would make sense:

raiph commented 7 years ago

Clarity!

What about just Q::type { ... } instead of quasi<Q::type> { ... }?

masak commented 7 years ago

Since Q::type { ... } would occur in term position, that'd collide with the syntax for regular terms: Q::type. Of course we could special-case that syntax by looking ahead for a block after the Q::type thing. We've been down that path before — the syntax for new Q::Identifier { ... } used to be Q::Identifier { ... } in 007. See #147. Nowadays I consider keywords like new and quasi to be sensible "guards" in the grammar that serve to disambiguate the parsing rule early so that there's less ambiguity.

Putting Qtypes in the Q:: hierarchy is just a convention. Someone could do class A::B::C::DefyingConvention extends Q and it'd be a Qtype. So any identifier we parsed we'd have to do the lookahead.

Lastly, though I seem to be in a minority in this thread to do so, I kind of like the keyword quasi and its connotations. (:wink:) It stays for now.

masak commented 7 years ago

Oh! Here's another thing I wanted to mention. Besides the slogan

Quasi quotes are not 007, they're a template.

that I mentioned above, I would also like to champion the slogan

Quasiquotation is a slang.

The two features that I usually associate with slangs fit well with quasiquotes:

But there's a third one too that I hadn't kept at the front of my attention but that seems blindingly true with quasis:

In the case of quasis, the runloop is simply a sub that evaluates to a Qtree.

I don't know if it means that all slangs are templates... I guess time will tell.

raiph commented 7 years ago

FYI, I just began reading A Guide To Parsing; in it, Federico explains how "In a way parsing can be considered the inverse of templating".

masak commented 7 years ago

Well, yes — parsing extracts extrudes a (possibly nested) model from flat text, and templating produces flat text from a (possibly nested) model.

But also, in subtle ways, no... which is why Frederico writes "in a way", I guess. :stuck_out_tongue: For one thing, spooling a flat structure from a deep model is algorithmically very different from deriving a deep model from something flat. There's a fair amount of guessing/backtracking when you do the latter which is missing from the former.

The one point that stands out from skimming the document is that Perl 6 (and 007) are languages such that scannerless parsing is the only option. That's because "where the lexer ends and the parser begins" becomes such an overriding issue that scannerless becomes the only reasonable option. And that's because "you have to know which language you're in" in order to even do lexing properly — in other words, lexing is tied up with and dependent on the parsing up to that point. Not all languages are like that, but it feels reasonable (maybe inevitable) that a syntax-extensible language with macros and slangs be like that.

I liked the text (having skimmed it). Thanks for throwing it my way. :smile:

masak commented 7 years ago

And also, yes, quasis (being templates) are the "inverse" of parsing in the sense that when you stuff a string into an EVAL call in Perl 6, the compiler has to parse that string as a new compunit... but when you put something in a quasi, it's being parsed (to the extent possible) along with the rest of the program, and so you've "saved" a parse step at runtime compared to EVAL.

raiph commented 6 years ago

Brainstorm incoming. I tried to read the entire thread before commenting but some dude raiph hijacked the thread with bikeshedding and I lost heart wading thru it all. I hope I'm not about to do repeat his mistake but we are all one after all so I apologize if there ain't that much difference between the two of us.

Also, I'm pretty sure the following will be completely invalid and/or stupid in multiple places, but I'm just going to spew it out rather than pause to worry about that because that can kill the flow and you were gracious about his stuff, so why shouldn't I just post what I've got right now?

Strawman proposal 1

Macros are their own slang, analogous to the way rules are. The macro slang is almost identical to the main language except it introduces a couple critical tweaks. It's essentially a template language where most of a typical template uses exactly the same syntax as the main language.

Macros with unquotes must initially mention their unquotes (with possibly one exception which I'll get to) as either parameters of the macro in a parenthesized list at the start, or via variable declarations.

Each such mention of an unquote / parameter must use a prefix which I'll call a sigil. I'm thinking symbols like $, @, &, etc. -- I'm sure you get my drift. The use of unquotes must use the same identifier as the initial mention, including the sigil.

So, for example:

macro foo (infix &infix, term $foo) {
    my $a = 42;
    quasi { $a &infix $foo }
}

The infix, term etc. must be, or be subtypes of, the AST type in the main language that correspond to named grammatical categories in the macro slang (and also, quite plausibly, in the main language too).

Strawman proposal 1.1

The default type for macro parameters depends on their sigil:

Strawman proposal 2

Replace use of the word quasi with [[ ... ]]. This would be a useless doubling of the square brackets in the main language so is arguably available for Perlish repurposing. {...} declares a fragment of the main language, including the braces. These mean different things:

macro foo [[ my &b = { my $a = 42 }; my $a = 99 ]]
macro foo { my &b = [[ my $a = 42 ]]; my $a = 99 }
macro foo [[ my &b = [[ my $a = 42 ]]; my $a = 42 ]]
macro foo { my &b = { my $a = 42 }; [[ my $a = 42 ]] }

The unicode characters ⟦ ⟧ may be used as an alternative to [[ ]].

Strawman proposal 3

Let the main language parse anonymous [[ ... ]] as IIMEs -- immediate inline macro expressions. And perhaps -> $foo [[ ... ]] as shorthand for macro ($foo) { ... [[ ... ]] }.

Am I completely crazy?

masak commented 6 years ago

Hi, sorry for dropping the ball on this one. Will attempt a reply now.

masak commented 6 years ago

Here's hoping I manage to write this without coming off as exceedingly nitpicky or grumpy. 😄 @raiph, the summary is that you are addressing issues which (I've come to understand) are important in your mind, but not so highly prioritized in mine. (I'm still glad that you're inventing in this space, don't get me wrong. Keep it coming!)

Because I know that summary is not nearly sufficient, let me list some specifics:

That's it. Again, sorry if this comes off as just finding flaws — I hope I'm getting across that the above are not so much reactions from me as they are from the realities of syntax, parsing, compilation, and Perl. (And, likely as not, my own view of things is limited too and there are adequate responses/workarounds to what I consider problematic.)

masak commented 6 years ago

Although I haven't much shifted my stance on whether "quasiquote" (and its current keyword/syntax) is easy to understand or optimal...

...I realized one thing the other day that sets 007 (and Perl 6) apart from Lisp: in 007 (and Perl 6) there's quasiquotation, but no quotation. (See here for a quick explanation of quotation and quasiquotation in Lisp.)

We don't have a construct, analogous to quasi, for quoting code. The closest we have is a quasi without any unquotes in it.

Again, this doesn't much change my stance as to the appropriateness of these terms in 007 and Perl 6... but it does explain to me parts of what people (maybe?) find distasteful about the term "quasiquote" in 007 and Perl 6.

masak commented 4 years ago

So, strawman proposal, use bracketing that includes a reversal of the ⌜ and ⌝ symbols:

[...]

@raiph, I'm not sure I said it clearly back in 2016, so let me say it now: reversing the quotation symbols for unquotes strikes me as an idea that really has the heart in the right place — they're called "unquotes", after all, and my working understanding of what they do semantically is that they decrement the quoting depth — it's just that I don't believe the execution of the idea works.

Or, to be more precise, I can't think of one example in the wild where it has worked. The reason, I think, is simple: it runs against the grain of what parsers are typically able to deal with. Squint hard enough, and you see not just parser technology adapting to what language syntax needs, but also the converse — it's not that we couldn't produce a parser that handled [...)-style ranges, or ⌝...⌜-style unquotes, it's that it would break too much on the way. Maybe part of the "break too much" is in the parser itself, and part of it is in the developer unconsciously emulating the parser, or the aggregated ecosystem of developers collectively parsing the code.

masak commented 3 years ago

...I realized one thing the other day that sets 007 (and Perl 6) apart from Lisp: in 007 (and Perl 6) there's quasiquotation, but no quotation. (See here for a quick explanation of quotation and quasiquotation in Lisp.)

Since I wrote the above, I've also come to realize that Scheme/Lisp quotation (and quasiquotation) is much worse for hygiene than Raku's/Alma's variant. (I've come to this realization after dipping into the two Kernel/vau papers, on which I hope to write several comments, soon.) I don't like to tease without providing details, so here is a brief summary: in Scheme/Lisp, lookup of a symbol happens entirely at runtime, whereas in Raku/Alma, lookup has a static part and a dynamic part; this is what leads on to unique variables and a stronger guarantee of hygiene.

Hope to write more on this soon.

masak commented 3 years ago

@raiph I just found the term "code quotes" in this paper, and found it sympathetic. From what I can see, these are actual quasiquotes (in Java), but the name is friendlier, and reminded me of your code keyword proposal above.

I don't see that I ever addressed using code as a keyword. (Though this issue thread is notoriously big, so maybe I did but can't find it.) It's fine, I think — I'd say it has merits on par with quasi in terms of how well it describes intent. Maybe if we imagine a group of relative beginners, code will look a lot friendlier and more disarming to them; if we imagine a group of grizzled Lisp vestigials veterans, quasi will similarly look natural as a keyword, but it will scare the beginners. (Whether this last aspect is seen as a positive or a negative depends, I guess, on whether one wants to put macro technology in the hands of beginners.)

I think in a parallel universe we could have easily gone with code from the start. As it is, S06 decided on quasi as a keyword, and there doesn't seem to be a compelling reason to switch.

Of course, since Alma is meant to be able to extend any and all syntax, code is still not out of the question as a syntactic extension... :grin:

raiph commented 3 years ago

Hi Carl, belated happy 2021. :)

I just noticed a bunch of comments in this repo in recent months that I'd previously missed in issues I'm mentioned in.

One thing I'm unclear about and need to resolve is whether including asides intended to be about the macros Raku eventually gets, not alma, are inappropriate here. In this comment I will presume they're OK if explicitly noted as such. PLMK if you'd rather I just dropped them altogether.


... it's not that we couldn't produce a parser that handled [...)-style ranges, or ⌝...⌜-style unquotes ...

You'd already said as much back in 2016, and I had replied with:

Fair enough. This was the other especially wild suggestion. :)

And now, in 2021, my thinking is... fair enough. :)


Scheme/Lisp quotation (and quasiquotation) is much worse for hygiene than Raku's/Alma's variant

Aiui the intent was that Raku's macros should default to hygiene so that sounds good.

Aiui the intent was that Raku's macros should also be able to be unhygienic. So there's that too. Though I'm sure that could reasonably be punted on for now / this decade.

Does quasi { foo } represent an ast that is a lambda whose body of code is foo? Or an ast that represents the compiler's evaluation of foo (either the symbol foo or a call of &foo)? And if it's the latter, does one write quasi {{ foo }} to get the former?

(And some more suggestions for Raku paint -- not alma. Maybe quasi [[ ... ]] / quasi ⟦ ... ⟧ would be an option for unhygienic quasis?)


I just found the term "code quotes" ... I'd say it has merits on par with quasi in terms of how well it describes intent.

I agree. Actually I'd say quasi is markedly better than code or codequotes.

One huge thing in favor of quasi is that you like it. Alma is primarily your and ven's adventure/story, and Raku's jnthn's, so you folk quite rightly get to choose the name of principal characters. And it really is very much a paint issue about a bikeshed, not a nuclear power plant ("Perl 6"!).

But quasi is also just better all round, than code. Imo. Here in 2021.

That said, as an incurably cheeky bit player I now strawman suggest yet another color that seems appealing as I write this sentence: template.

masak commented 3 years ago

Hi Carl, belated happy 2021. :)

Happy 2021, and (a not belated) 新年快乐 in the year of the ox! Happy 牛 NIU year 😆

I just noticed a bunch of comments in this repo in recent months that I'd previously missed in issues I'm mentioned in.

Well, this thread is funny — it started out as a small checklist to do an ostensibly simple thing (to make HN happy, more than five years ago), but it quickly ran into a technical hurdle which I will now describe:

Let's say there's an unquote in a quasi: {{{ myAst }}}. Parsing this unquote isn't the problem, but let's say there's a plus symbol after it: {{{ myAst }}} + .... Now if myAst contained a term of some kind, we should parse that plus as infix:<+>. But if it contained, I dunno, a statement, then the plus should be a prefix:<+>. When in doubt, we fall back on our deeply held principles, and in this case Raku has one: always know what language you're in. In practical terms, it means we can't just leave the situation like this — something is missing so that the parser can always know what language it's in — because the value of myAst isn't available yet; it comes in "a later phase", when we macro-expand, and the parser needs to know now.

The resolution to that, which basically happened in and around this issue, is the {{{ Q::Term @ myAst }}} syntax. (And {{{ Q::Statement @ myAst }}}.) Because expressions are by a long margin the thing we're likely to pass as ASTs, we take that as the default; if you have a non-expression, you need to specify its grammatical class so that the parser still knows which language it's in after the unquote.

I'm not married to this syntax which kind of introduces a new (pseudo-)operator @ which may or may not fly in Raku... but that part feels like a small detail compared to the victory of having a solution at all.

Unfortunately, just outlining the solution doesn't make it so; the implementation of this has stalled on exactly how to make that happen within the limits of a recursive-descent parser, and so I've been taking a yak-shaving tour towards exploring alternate parsers that would be up to the job (#293). (As I type this, I'm thinking "surely it can't be that tricky? All I need is to hardcode multiple rules for the different types of unquotes...". I might give it a go, if for no other reason than to remind myself why it was indeed that tricky.)

As this thread got stuck like a gazelle on the savanna, it became easy prey for discussions about quasiquoting, possible syntaxes, and philosophy/etymology. I would like to stress immediately that this has been one of the most valuable/enjoyable Alma issues for me, if not the most valuable/enjoyable. Simply because it has generated so many thoughts and ideas and bravely put them up for scrutiny — as well as involving basically all the people who have ever had an active interest in Alma. It's suitable, if unquotes are somehow at the core of what Alma is and wants to be, for this issue to also be the core of the repository's activity. I have not minded at all the fact that often the solutions proposed were for problems that I already considered solved, or not in dire need of solving. 😄 It is right to call some of the discussion "bikeshedding", yes, but that term to me was never all that negative; it's about finding smaller optima within larger ones.

One thing I'm unclear about and need to resolve is whether including asides intended to be about the macros Raku eventually gets, not alma, are inappropriate here. In this comment I will presume they're OK if explicitly noted as such. PLMK if you'd rather I just dropped them altogether.

They are not just OK, but encouraged. My stance on Raku these days is... complicated, but I still like and believe in the language. Although Alma (née 007) is not exclusively about supplying macros for Raku (née Perl 6), that charter is still in play. (Concretely, it's less about "yo Raku, you should use the @ syntax to indicate grammatical category in unquotes!" and more about "yo Raku, in our explorations we discovered that a mechanism is necessary for indicating grammatical category in unquotes!". If you see what I mean. Just like Perl 5 and Raku learn from each other in ways that are primarily non-syntax-based, so do Alma and Raku.)

Scheme/Lisp quotation (and quasiquotation) is much worse for hygiene than Raku's/Alma's variant

Aiui the intent was that Raku's macros should default to hygiene so that sounds good.

I still haven't dropped the other shoe on the above teaser, but it's happening slowly over in #302 (another @raiph thread, and strong contender to "most interesting discussion in the repo", at least I think so).

But saying things many times is probably good, so I'll try in a partial, most likely flawed way now: in Raku (and Alma), parsing goes as far as establishing the definition of a use there and then. That is, if you have

my $x = ...;

# ... (no intervening variable declaration)

... $x ...

The parser knows enough when it sees the use $x to say "yup, that's with 100% absolute certainty the variable that I declared back in my $x".

Not so Scheme. In Scheme, a symbol is just a symbol, and so the parser sees an x and goes "huh, an x". And that's it. No reasoning at that point about uses and definitions.

(I know, it's surprising! Scheme is known for its hygiene.)

The upshot is that, when time comes to expand a hygienic macro, what Scheme needs to do is a massively complicated (and costly) cover-all-bases alpha-rewriting of everything. John Shutt described this process as something he made sure he understood well enough for his thesis work, but then promptly forgot about soon after. The point is that, because the parser does the bare minimum, all of the hygiene work has to happen during macro expansion instead. In Raku/Alma, the parser quite naturally does more, and so the later hygiene work during macro expansion becomes less.

Aiui the intent was that Raku's macros should also be able to be unhygienic. So there's that too. Though I'm sure that could reasonably be punted on for now / this decade.

"Unhygienic" could refer to two things.

So, not much need for punting, I think. Like, 90% of "unhygienic" isn't even a feature, and the remaining 10% we have a plan for and could prioritize as high as we want.

Does quasi { foo } represent an ast that is a lambda whose body of code is foo? Or an ast that represents the compiler's evaluation of foo (either the symbol foo or a call of &foo)? And if it's the latter, does one write quasi {{ foo }} to get the former?

Since this question follows directly on talks of hygiene, I'm assuming you feel it's making a connection to hygiene. But I... don't quite see it. I'll try to answer anyway. There are two ways to think about what a quasi expression represents. The first is that a code level, it's more of a "template" or (if you will) a function waiting for some parameters that it'll use to build an AST. The second is that whenever you evaluate a quasi expression (at "runtime", which may well be at macro expansion time during compile time), the result you get is an AST. The template/function interpretation is "internal" to the implementation, and all we ever observe is the AST result.

In either case — regardless of which interpretation you pick as "the" interpretation — it's the case that foo written inside the quasi expression always refers to the foo that's in play at that point. (That's hygiene.) For it to be something else, you'd need to have anaphorically introduced some other name using parser.declare.

I just realized something. The above means that parser.declare probably isn't a suitable solution here. The reason is that it looks too much like a regular method call, but what we're doing here is introducing a name, including inside the quasi. We have two audiences when introducing the name: the mainline/injectile, where the name is to be actually used later in practice, but also the quasi itself, where the parser needs to know which language it's in, and so it needs to share the knowledge that a variable has been injected there. So the syntax for that can't just be parser.declare, it needs to be something more declaration-y and macro-y, like inject my foo = ...;.

(And some more suggestions for Raku paint -- not alma. Maybe quasi [[ ... ]] / quasi ⟦ ... ⟧ would be an option for unhygienic quasis?)

Well, first off, see all of the above answers.

Second off, I'm still trying to follow the initial design set out by S06. (Yes, seemingly years after everyone else stopped caring about the synopses. 😄) What it says about this is that you can choose your delimiters as you please for quasi, and it'll keep working the same. That is, {} delimiters are the default, but if you choose something else, you don't lose hygiene. (It doesn't say this explicitly, but I wouldn't say it leaves room for other interpretations either.)

Also, when we say hygiene is so nice we'd like it to be the default (in spite of Common Lisp people's contrary opinion), I'm wondering if dropping hygiene just by switching delimiters doesn't make it a little bit too easy to drop hygiene. At the very least, I'd like to see some other reason to want to drop hygiene (besides anaphora) before I go and make it super-easy.

One huge thing in favor of quasi is that you like it. Alma is primarily your and ven's adventure/story, and Raku's jnthn's, so you folk quite rightly get to choose the name of principal characters. And it really is very much a paint issue about a bikeshed, not a nuclear power plant ("Perl 6"!).

But quasi is also just better all round, than code. Imo. Here in 2021.

I do like it. And via S06, either Damian or Larry liked it many years ago, and I enjoy the feeling of respecting that, too. In fact, if you take a huge step back and imagine a whole host of activities where you keep some parts of a quotation literal and allow some parts to be interpolated — Python f-strings, ES6 template strings, SQL prepared statements, Lisp quasiquotes — I think a good term for that general activity might very well be "quasiquotation".

That said, as an incurably cheeky bit player I now strawman suggest yet another color that seems appealing as I write this sentence: template.

The problem with that is that it focuses on the mechanism and not on the result, which is usually a mistake when naming things in programming languages. Consider if instead of function JavaScript had called it function-definition. (Which is correct, but unhelpful.) Or instead of try we had code-that-might-throw. The thing that comes after the quasi keyword is a template, yes... but the value it generates is an AST.

Now, arguably, the term "quasi" commits exactly the same mistake, focusing on the mechanism and not the result. But while template seems to focus only on the mechanism — the templating itself, as it were — the connotations of quasi feel to me as if they are also about the result we get out. This is very vague and subjective, I know, but nonetheless a felt difference.

(A famous and widespread example of "keywords that talk about the mechanism" is switch. I seem to recall Apocalypse 12 calls this out, even, as it introduces the given construct. A recent accepted PEP calls its new switch statement keyword match.)

masak commented 3 years ago

[...]

Of course, since Alma is meant to be able to extend any and all syntax, code is still not out of the question as a syntactic extension... 😁

I'd like to solicit a macro code that does this. It should be is parsed (according to one of the many iterations of is parsed floating around), and code { ... } should desugar to quasi { ... }.

Something tells me that that's not as easy as it might sound, because quasi is "special". But being able to implement code convincingly in Alma would be a nice milestone, I think.

vendethiel commented 3 years ago

Something tells me that that's not as easy as it might sound, because quasi is "special". But being able to implement code convincingly in Alma would be a nice milestone, I think.

I don't think that's possible in (most) Lisps, FWIW. You can't define something like ,, ,@ or ` by yourself.

masak commented 3 years ago

I don't think that's possible in (most) Lisps, FWIW. You can't define something like ,, ,@ or ` by yourself.

Bel defines those in terms of Bel: syn \` and syn \, for the reader and mac bquote for the evaluator. All of them in userland.