masak / alma

ALgoloid with MAcros -- a language with Algol-family syntax where macros take center stage
Artistic License 2.0
139 stars 15 forks source link

Implement quasi unquotes #30

Open masak opened 9 years ago

masak commented 9 years ago

Hacker News wants unquotes. We happily oblige.

Unquotes in expressions

Whenever the parser is ready to parse a term, it should also expect an unquote.

quasi { say("Mr Bond!") }
quasi { say({{{greeting_ast}}}) }

Technically, I don't see why we shouldn't expect the same for operators. But we get into the interesting issue of what syntactic category it is.

Screw it, I'm tired of theorizing. Let's just steal the colon for this.

quasi { 2 + 2 }
quasi { 2 {{{infix: my_op}}} 2 }

quasi { -14 }
quasi { {{{prefix: my_op}}}14 }

quasi { array_potter[5] }
quasi { array_potter{{{postfix: my_op}}} }

Backporting this solution to terms, you could mark up a quasi as term if you want, but it's the default so you don't have to:

quasi { say("Mr Bond!") }
quasi { say({{{term: greeting_ast}}}) }

At the time of evaluating the quasi (usually macro application time), we'll have the type of the unquoted Qtree. The runtime dies if you try to stick a square Qtree into a round unquote.

But the parser can sometimes reject things early on, too. For example, this shouldn't even parse:

quasi { sub {{{prefix: op}}}(n) { } }

(That slot doesn't hold an operator, it holds an identifier.)

Unquotes for identifiers

007 currently has 5 major places in the grammar where it expects an identifier:

The traits one is kind of uninteresting right now, because we have four trait types. Someone who really wanted to play around with dynamic traits could write a case expression over those four. So let's skip traits — might reconsider this if we user-expose the traits more.

The three declaration cases are the really interesting ones. Notice that each of those has a side effect: introducing whatever name you give it into the surrounding lexical scope. (Handling that correctly is likely part of the #5 thing with Qtree strengthening.)

I would be fine with the {{{identifier: id}}} unquote accepting both Q::Identifier nodes and Str values. Q::Identifier is basically the only node type where I think this automatic coercion would make sense.

Unquotes in other places

These are the remaining things I can think of where an unquote would make sense:

masak commented 9 years ago

After some deliberation on #6macros, I've decided to make the syntax {{{my_op: Q::Infix}}} instead. And generally I think we ought to try conflating grammatical categories and Q types.

Conceptually, the typing is a run-time check. With #33 we can probably do better than that, and detect impossible things statically. The important thing is that the parser is no longer confused.

masak commented 9 years ago

I've thought some more about it and I think that the syntax should be {{{my_op @ Q::Infix}}} instead of with the colon. Yes, Ven++ was an influence here.

But what finally brought me around is that we'll likely end up with a syntax like quasi @ Q::Trait { ... }, and there it feels like @ matches better. (And the two should definitely be the same symbol, and we could read it as "quasi as trait".)

We still kind of get the strange consistency with the colon even when it isn't a colon. And it's kind of nifty that it's actually another symbol, too. People could go "ah, it's like types, but for parsing".

masak commented 8 years ago

I started keeping a "quasi checklist", a list of the individual parsing modes a quasi term should be able to handle. Might as well keep it here in the issue, rather than offline.

Note that this is a checklist for quasis, not for unquotes. Will need a similar checklist for unquotes, which I deferred until after this for reasons I don't remember now.

masak commented 8 years ago

Alright, now that that's done, let's keep a checklist for the unquote forms, too. Seemed to work pretty well.

masak commented 8 years ago

I would be fine with the {{{identifier: id}}} unquote accepting both Q::Identifier nodes and Str values. Q::Identifier is basically the only node type where I think this automatic coercion would make sense.

(Syntax is now {{{id @ Q::Identifier}}} {{{Q.Identifier @ id}}}.)

Let's be conservative and wait with this until we see a clear use case for it. I think if we decide to go down this road, the really useful part would be the ability to auto-concatenate identifiers and strings into bigger identifiers, like so:

gensym_{{{q @ Q::Identifier}}}

That could be really nice — but let's scope that to be outside of this issue.

masak commented 8 years ago

I'll just note that this issue is a little bit stalled because it's hit some conceptual difficulties.

When you're about to parse an expression, you might hit a prefix op or a term. Therefore, any of these is fine as an expression:

1 + 2
-42
{{{ast @ Q::Term}}} + 2
{{{ast @ Q::Literal::Int}}} + 2
{{{ast @ Q::Prefix}}} 42

The problem comes because we don't know which "parser mode" we should have been in until after the @ when we see the Q type. We want to preserve one-pass parsing, so that we don't re-trigger effects from parsing the expression before the @. (Let's say it wasn't ast but a sub or a macro with declarations and stuff inside.)

Perhaps it would be possible to "fixup" the parser state once we see the Q type. The parser assumed it was expecting a term (say), but now that we see Q::Prefix, we tweak it to instead expect a prefix. I have no clue how that'd be made to work.

Or, maybe the {{{expr @ qtype}}} syntax is inherently disadvantageous and should be changed. Something like qtype @ {{{expr}}} or @ qtype {{{expr}}} would work fine.

masak commented 8 years ago

Nothing has happened on this front. I'm defaulting to suggesting qtype @ unquote(expr) as a way forward.

vendethiel commented 8 years ago

Not sure how that allows one-pass parsing, though ?

vendethiel commented 8 years ago

But I think that, inherently, having splicing in a syntaxful language means you can't keep one-pass parsing. Or you're going to have to patch holes by yourself.

One way that could look like would be...

token if { 'if' <cond> '{' <block> '}' }

token cond:splice { '{{{ Q::Cond @'  <var> '}}}' }
token cond:expr { <expr> }

token block:splice { '{{{ Q::Block @' <var> '}}}' }
token block:simple { '{' <expr> + %% ';' '}' }

That looks awful – because it is.

As we've already pointed out numerous times, this is not an issue at all in Lisp(s), because we're already done parsing when we do splicing.

(foo ,@(mapcar #'process xs))
;; literally parsed as (at least in CL)
(foo (unquote-splicing (mapcar #'process xs)))

So, it's "free" in the sense that it's not in any way special syntax. I think we've established that quite some time ago, but I think it's still worth pointing out.


Anyway, I'm not sure how a solution featuring "one-pass parsing" would make sense, let alone look like. Maybe we could get away with only a lot of backtracking? Even then, I'm not sure.

I'm not arguing we should pre-process the code, and put in the AST text in place of the {{{ }}} just to parse it. That'd defeat the purpose of hygienic macros (mostly). But I think we should act "as if" we just parsed a block, and in the action, just use the given AST fragment (<var> in my example).


I've thought about figuring out a syntax lisp-like, in that you could also use it for day-to-day use, like Lisp can use quote/quasiquote/unquote for arrays... But as I've just said, we are not homoiconic – Elixir is not either, and they have macros, but their syntax is far less flexible ... Though it's worth it to know what they're doing, AFAIK they only allow to splice in expressions – you can't write 3 unquote(plus) 4, which makes it considerably easier for them.

masak commented 8 years ago

Not sure how that allows one-pass parsing, though ?

Well, hm, we'd know with a one-token lookahead... but that doesn't sound all that alluring now that I say it out loud.

Notably, the expression parser would still need to go "oh, that's a Q::Mumble::Handwave, that's a TTIAR. but is there a @ after it, then it's possibly OK!"

masak commented 8 years ago

Ok, new suggestion that really front-loads the unquote signal: unquote(qtype @ expr)

vendethiel commented 8 years ago

that doesn't really simplify anything over {{{, does it? (...except it doesn't look as ugly)

masak commented 8 years ago

But I think that, inherently, having splicing in a syntaxful language means you can't keep one-pass parsing. Or you're going to have to patch holes by yourself.

That's an interesting assertion that I don't immediately agree with. :smile:

One way that could look like would be...

token if { 'if' <cond> '{' <block> '}' }

token cond:splice { '{{{ Q::Cond @'  <var> '}}}' }
token cond:expr { <expr> }

token block:splice { '{{{ Q::Block @' <var> '}}}' }
token block:simple { '{' <expr> + %% ';' '}' }

Yes, that's the worst-case solution to what we need to do with the grammar.

As we've already pointed out numerous times, this is not an issue at all in Lisp(s), because we're already done parsing when we do splicing.

Hm, wait, isn't this trivially true for Perl 6 as well? Parsing happens first, and leads to an AST with a lot of Q::Quasi nodes in it, possible with a lot of Q::Unquote nodes in them. Then (usually) a macro is invoked (still at parse-time, but at least later), and as part of this, a things splice. Granted, we're not done with all the parsing, but we're done with the quasi we're splicing into. Or am I missing something?

So, it's "free" in the sense that it's not in any way special syntax. I think we've established that quite some time ago, but I think it's still worth pointing out.

Given that what we're implementing is a Perl and not a Lisp, are there any obvious benefits that carry over cleanly that I seem to be stubbornly ignoring? That question is often on my mind. :smile:

masak commented 8 years ago

that doesn't really simplify anything over {{{, does it? (...except it doesn't look as ugly)

Oh, but it does, because unquote(qtype @ expr) introduces its three components in this order:

Compare this to {{{expr @ qtype}}} which:

vendethiel commented 8 years ago

Sorry, I meant "versus the possible {{{qtype @ expr}}} form".

masak commented 8 years ago

Oh yeah, that one's equivalent as far as the above argument is concerned.

Any reason I should prefer {{{qtype @ expr}}} to unquote(qtype @ expr) and not use this opportunity to ditch the universally hated {{{ }}} syntax? :smile:

vendethiel commented 8 years ago

Absolutely none! Except maybe that it doesn't stand out as much – which is fine for Lisp, since it has no syntax, but might not be for Perl 6/007.

I thought {{{qtype @ expr}}} was one of the variation of your previous comment, but apparently it wasn't, so I stand corrected.

masak commented 8 years ago

Here's one reason I've fallen out of love with {{{ ... }}}: S06 talks a lot about custom delimiters for the quasi block, so you could have quasi < ... >, quasi ⟦ ... ⟧, etc. Nicely enough, you then use the delimiters (tripled) for the unquotes: <<< ... >>>, ⟦⟦⟦ ... ⟧⟧⟧. Cool.

But then the final ingredient which would make all that machinery make sense is missing: you can't nest those things. Specifically, you can't unquote "several levels up" in analogy with how you'd do flow control with loop labels:

quasi <
    quasi ⟦
        ⟦⟦⟦ ... ⟧⟧⟧;    # I'd expect this to escape out of the inner quasi
        <<< ... >>>;    # I'd expect this to escape out of the outer quasi
    ⟧;
>;

TimToady assures me that this doesn't work, shouldn't work, and will never work. The reason, as far as I understand it, has to do with proper nesting, which is something we value in the parser more than the above conjectural feature. The way the parser works is that it never violates proper nesting, and so you can't unquote < ... > if there's a ⟦ ... ⟧ layer in-between.

Given all that, I just don't see why anyone would choose to use a non-standard delimiter, except possibly to make their code look funny. And — given that I considered the above conjectural feature to be the chief argument for the {{{ ... }}} syntax — I don't see why we need to stick to the {{{ ... }}} syntax.

masak commented 8 years ago

Absolutely none! Except maybe that it doesn't stand out as much – which is fine for Lisp, since it has no syntax, but might not be for Perl 6/007.

Yes, that's perhaps one of the biggest unknowns with the unquote(qtree @ expr) proposal. Given masak's Deplorable Truth About Templating Syntax, I guess the empirical experiment here is "what happens if we make our placeholder syntax not be ugly and stand out so much?".

Let's find out! :smile:

vendethiel commented 8 years ago

You needn't convince me about nesting – I argued about it quite a lot already!. So, yeah, I don't remember believing in that S06 bit... but anyway, whatever you choose is fine.

We could try translating these:

from Graham's "On Lisp": (and another one is the same SO question):

(defmacro =defun (name parms &body body)
  (let ((f (intern (concatenate 'string
                                "=" (symbol-name name)))))
    `(progn
       (defmacro ,name ,parms
         `(,',f *cont* ,,@parms))
       (defun ,f (*cont* ,@parms) ,@body))))
vendethiel commented 8 years ago
macro cont-defun(Q::Identifier $name, Q::ParamList $parms, Q::Block $body) {
  my $new-name = $name.scope.declare("cont-$name.bare()");
  quasi {
    macro unquote(Q::Identifier @ $name)(unquote(Q::ParamList @ $parms)) { # should the ParamList unquote be bare, without parenthesis?
      quasi {  # an unquote inside of another one
        my $f = unquote(Q::Identifier @ unquote(Q::Identifier @ $new-name)); # ... just to make this more readable
        $f(unquote(Q::ParamList @ unquote(Q::ParamList @ $parms))); # same q. for ParamList
      };
    }; # end of inner macro
    sub unquote(Q::Identifier @ $new-name)(unquote(Q::ParamList @ $parms))
      unquote-unhygienic(Q::Block @ $block); # mmh... needs that "unhygienic" so that `$parms` are visible.
  };
}

Whew.

vendethiel commented 8 years ago

Seems like the github live update earlier had me miss one comment...

Hm, wait, isn't this trivially true for Perl 6 as well? Parsing happens first, and leads to an AST with a lot of Q::Quasi nodes in it, possible with a lot of Q::Unquote nodes in them. Then (usually) a macro is invoked (still at parse-time, but at least later), and as part of this, a things splice. Granted, we're not done with all the parsing, but we're done with the quasi we're splicing into. Or am I missing something?

That's not exactly what I mean – but I was unclear, as usual. I meant that there's no "parser state" to be recovered after an unquote. You don't have to "insert" anything, to parse in a different way – it's not even really splicing at parse-time, it's a function call in the source code.

Hm, wait, isn't this trivially true for Perl 6 as well? Parsing happens first, and leads to an AST with a lot of Q::Quasi nodes in it, possible with a lot of Q::Unquote nodes in them. Then (usually) a macro is invoked (still at parse-time, but at least later), and as part of this, a things splice. Granted, we're not done with all the parsing, but we're done with the quasi we're splicing into. Or am I missing something?

Not quite, exactly because of that same "parser state". In lisp, it's all functions calls. Doesn't matter if you're splicing an operator or whatever-else.

Given that what we're implementing is a Perl and not a Lisp, are there any obvious benefits that carry over cleanly that I seem to be stubbornly ignoring? That question is often on my mind. 😄

I'm not 100% sure what you're refering to here?

masak commented 8 years ago

I'm not 100% sure what you're refering to here?

Just making sure it's not the case that the Lisp advantages you like to highlight are blindingly obviously mappable onto the problems we're wrestling with in 007 (and Perl 6).

The very-early stages of Perl 6 macros consisted of "yuh, of course we're gonna have Lisp macros (and not dirty C macros, at least not just)". Later: "oh, huh, Perl is not Lisp, Perl is Perl". (And all the other languages that are not Lisp are also not Lisp, and all have slightly different readings of what "macro" means.)

With all that said, I really, really want to learn from Lisp and soak up all the experience with macros that's obviously there. That's why I keep trying to read Lisp code, and to grok what the macros in there are doing.

raiph commented 8 years ago

Completely rewritten

Hopefully you'll forgive some bikeshed coloring suggestions.

On less confusing and prettier quasi syntax

Claim: the keyword-and-block quasi { ... } syntax is confusing.

So...

Wikipedia:

Quasi-quotation is sometimes denoted using the ... double square brackets, ⟦ ⟧, ("Oxford brackets") instead of ordinary quotation marks.

So, strawman proposal, replace the quasi {} construct^1 with a circumfix operator such that:

⟦⟦ 42 + 99 ⟧⟧

quasi quotes the enclosed code.^1

(Or more brackets, with matching open/close count eg ⟦⟦⟦ say pi; say now ⟧⟧⟧^2 etc.)

For the Texas equivalent, use at least three pipes:

||| 42 + 99 |||
# or more pipes if desired: 
||||| say pi; say now |||||

On less confusing and prettier unquote syntax

Claim: the {{{ ... }}} syntax is confusing.

So...

Wikipedia:

Quasi-quotation is sometimes denoted using the symbols ⌜ and ⌝ (unicode U+231C, U+231D) ... instead of ordinary quotation marks.

So, strawman proposal, use bracketing that includes a reversal of the ⌜ and ⌝ symbols:

say $onething, ⟦⌝ $ast-arg ⌜⟧, $anotherthing;

For the Texas equivalent use:

say $onething, ||^ $ast-arg ^||, $anotherthing;

^1 If there must be a keyword, perhaps AST or to-AST or to-ast or toast rather than quasi?

masak commented 8 years ago

Hopefully you'll forgive some bikeshed coloring suggestions.

Consider yourself forgiven.

And please forgive me in turn for answering somewhat selectively, where I feel I have something to say. Feel free to take my silence on the rest as implicit agreement. Maybe.

  • The thing that looks like a block isn't actually a block. It's a quote that happens to use the same delimiters as blocks. This is very misleading, imo.

I'll 50% agree on this one. It's a block more often than people actually mean block. That is, in

quasi { x + 5 }

someone clearly meant to produce the expression x + 5, not the block { x + 5 }. The braces here are only a kind of delimiter. Our oldest open 007 issue is a little about that, actually.

On the flip side, though, it is a block in the sense that it holds in variable declarations.

quasi {
    my y = "oh, James!";
    say(y);
}
# no `y` variable here, nor outside where the macro/quasi gets expanded

This is part of "hygiene". I expect people are going to like that and be pleasantly un-surprised by that, because that's how braces/blocks behave.

  • The word "quasi" has a popular meaning that only quasi relates to quasi-quoting. The popular meaning is too general to be helpful.

Ok, but is it worse than using enum for enumerations or subset for subtypes or given for switch statements? I don't think so.

In some ways, it's too good a term to pass up, even when the match isn't squeaky-Lisp perfect.

  • The technical term quasi quoting means "a linguistic device in formal languages that facilitates rigorous and terse formulation of general rules about linguistic expressions while properly observing the use–mention distinction".

Would it help if I reformulated the above as "a neat way to build code, some of comes in as parameters"? Because that's what it says, 'cept with more stilted language.

Quasi-quotation is sometimes denoted using the ... double square brackets, ⟦ ⟧, ("Oxford brackets") instead of ordinary quotation marks.

Yes, in logic. That whole article you're quoting refers to Quine's notion of quasiquotation in formal languages, that is, in mathematical logic. The Lisp-y backquote and the comma would have much bigger claims to quasiquotation syntax than ⟦ ⟧. Those are quite decent candidates in Lisp because they're "free for use" in Lisp. In Perl 6, the backquote has been reserved for DSLs, and the comma is used all over the place.

So, strawman proposal, replace the quasi {} construct^1 with a circumfix operator such that:

⟦⟦ 42 + 99 ⟧⟧

quasi quotes the enclosed code.

Not saying I endorse it, but what in your syntax would be the corresponding way to express quasi @ Q::ParameterList { x, y, z } ?

(Or more brackets, with matching open/close count eg ⟦⟦⟦ say pi; say now ⟧⟧⟧^2 etc.)

Unmatched footnote.

What's your use case for having more brackets? Do you conceive of something like ⟦⟦⟦ ... ⟦⟦ ... ⟧⟧ ... ⟧⟧⟧ ? When would that be needed, in your opinion? (See this comment in the same thread for some discussion about nested quasis and unquotes.)

For the Texas equivalent, use at least three pipes:

||| 42 + 99 |||
# or more pipes if desired: 
||||| say pi; say now |||||

Still not assuming I endorse the rest, this is not going to fly because the Texas equivalent of a matching pair of braces can not be a non-matching symbol. (Not by some hardcoded rule specified somewhere, granted, just... not.)

Claim: the {{{ ... }}} syntax is confusing.

  • I'm not arguing it's ugly. I'm not arguing it isn't.

Ok, then let me help you with that one: it's ugly. :smile:

Question is more like, is that a (necessary) feature, or something we should fix?

  • It's confusing. Because braces are associated with blocks.

Again, blocks convey scoping and hygiene, something that isn't all that far-fetched with unquotes either.

Quasi-quotation is sometimes denoted using the symbols ⌜ and ⌝ (unicode U+231C, U+231D) ... instead of ordinary quotation marks.

You're proposing using a quasi-quotation symbol (from mathematical logic) to stand in for unquotes? You don't consider that a bit confusing?

So, strawman proposal, use bracketing that includes a reversal of the ⌜ and ⌝ symbols:

say $onething, ⟦⌝ $ast-arg ⌜⟧, $anotherthing;

I think that falls under "you think it's cute today". More specifically/constructively: we could have played around with outwards-facing brackets for ranges, like some mathematical textbooks do:

[1, 2]    # closed
]1, 2[    # open
[1, 2[    # half-closed, half-open
]1, 2]    # half-open, half-closed

But we didn't — nor a competing form with ranges like [1, 2) — because it completely messes up parsing. I think a ⟦⌝ ... ⌜⟧ construct should be avoided for much the same reasons.

Also, what in your syntax would be the corresponding way to express {{{ ast_arg @ Q::ParameterList }}} or — using the newer as-yet unimplemented syntax — unquote(Q::ParameterList @ ast_arg)?

^1 If there must be a keyword, perhaps AST or to-AST or to-ast or toast rather than quasi?

One good thing about this footnote: I finally understood why the heck you proposed toast as a keyword back in your advent post comment. I guess what I'm saying is I personally didn't find "toast" as a term very transparent. :smile:

raiph commented 8 years ago

[quasi is hygienic because it has closure semantics] [this motivates use of { ... } syntax]

I consider these good arguments for using { and } as the delimiters of a quasi.

However, I was thinking that you intend to also support quasis that:

Afaict supporting these, even if they're a much less common use case, would be good arguments for not using { and } as the delimiters of a quasi.

[is "quasi"] worse than using enum for enumerations or subset for subtypes or given for switch statements?

Yes, very much so according to google, wikipedia, and my English language sensibilities.^1

Would it help if I reformulated the above as "a neat way to build code, some of comes in as parameters"?

The reformulation isn't helpful to me because I understand quasi quotation.

But yes, I think that reformulation is much closer to what mere mortals will get.

It also happens to coincide with the code suggestion that I cover at the end of this comment.

The Lisp-y backquote and the comma would have much bigger claims to quasiquotation syntax than ⟦ ⟧.

Sure. But:

In Perl 6, the backquote has been reserved for DSLs, and the comma is used all over the place.

So that nixes backquote and comma.

what in your syntax would be the corresponding way to express quasi @ Q::ParameterList { x, y, z } ?

I don't know. I'll respond to that in another comment.

What's your use case for having more brackets?

It's pretty weak.^2 Let's move on.

||| 42 + 99 |||

is not going to fly ... (Not by some hardcoded rule specified somewhere, granted, just... not.)

Fair enough. That was one of a couple of especially wild suggestions. :)

You're proposing using a quasi-quotation symbol (from mathematical logic) to stand in for unquotes?

Well, as I said, I'm strawman proposing. This is part of exploring what I see as problems with planned Perl 6 syntax. It didn't feel right to simply complain without strawman proposing something else.

But, yes, I see the mathematical usage to be so near identical in concept to quasi-quoting in programming languages that use of the same syntax seemed attractive and I see unquotes as basically the inverse of quasi-quoting.

That said:

say $onething, ⟦⌝ $ast-arg ⌜⟧, $anotherthing;

I think that falls under "you think it's cute today". More specifically/constructively: we could have played around with outwards-facing brackets for ranges, like some mathematical textbooks do ... But we didn't ... I think a ⟦⌝ ... ⌜⟧ construct should be avoided for much the same reasons.

Fair enough. This was the other especially wild suggestion. :)

Also, what in your syntax would be the corresponding way to express {{{ ast_arg @ Q::ParameterList }}} or — using the newer as-yet unimplemented syntax — unquote(Q::ParameterList @ ast_arg)?

I'll defer a response to that for a later comment.

^1 If there must be a keyword, perhaps AST or to-AST or to-ast or toast rather than quasi?

I guess what I'm saying is I personally didn't find "toast" as a term very transparent. :smile:

Indeed. That was part of my point.

I still think I'd find it easier to explain my mostly tongue-in-cheek suggestion "toast" to a mere mortal than "quasi". (To a small degree it was not entirely tongue-in-cheek; I thought it could perhaps be spelled to-AST or somesuch.)

Anyhow, I've since come up with another strawman proposal:

code [[ x + 5 ]]

That is, the keyword code followed by any amount of whitespace, followed by [[ without whitespace, code, and then matching ]].

Of course, the [[ could be misunderstood by a human as the opening of a literal array in an array, in a manner similar to { being misunderstood by a human as being the opening of a block or hash. But I like that it allows for as a Unicode equivalent. (I was surprised you complained that is for math quasi quotes, not programming quasi quotes, and have not yet let go of the possibility that you will have a change of heart so that you see this correspondence as a positive.)

And for unquotes I strawman propose:

code [[
    x + {{ y }}
]]

If a user actually means to write {{ ... inside a code [[ ... ]] to mean the start of a hash in a hash or a hash in a block or whatever they must disambiguate by adding whitespace between the braces.

If you really don't like the above, please consider waiting a week or two before replying. :)


^1

Data For me a google for "enumeration" yields 100+ million matches. "commonly used in mathematics and computer science to refer to a [ordered] listing of all of the elements of a set.". Several mainstream languages use enum as their enumeration declarator keyword. (I recall you referring to confusion if we use "enum" to refer to both the overall declaration and individual elements, and I concur. But that's a different issue.)

Conclusion? "enumeration" is a common English word. It has exactly the right meaning for a programming language enumeration. enum is a common choice as a programming language enumeration declarator. Using the same keyword in Perl 6 makes sense.

Data For me a google for "subset" yields 40+ million matches. "a part of a larger group of related things. ... a set of which all the elements are contained in another set." In Perl 6 a subset is a particular kind of subtype, one that is guaranteed to only be a subtype by virtue of being a subset of the allowed values (so a subset never changes any operations).

Conclusion? "subset" is a common English word. It has exactly the right meaning for a subtype that's a subset. Using the keyword subset makes sense.

(Likewise "given" is also a common English word whose meaning is well suited to the use made of it in Perls.)

Data For me a google for "quasi" yields 200+ million matches. "seemingly; apparently but not really. ... being partly or almost.". So, a popular word, used more than enumeration and subset combined, but with a meaning that isn't at all suggestive of what a quasi quote is when it's used without the word "quote". And the #1 result when I search for the highly specific phrase "quasi-quotation" is the wikipedia page on "quasi-quotation" which all but ignores programming language use of the concept.

Conclusion? "quasi" is a common English word. It does not remotely hint at what it might mean in Perl 6. Even the relatively obscure term "quasi quotation" is a poor fit. I was hugely surprised by your suggestion that the keyword choices "enum" and "subset" are of about the same quality as "quasi". Hopefully the foregoing has opened your mind about this bit of bikeshedding. You may be in love with "quasi", and it makes sense in 007, but I would like to think that Perl 6 macros will aim at avoiding unnecessarily weird color schemes.

^2 Assuming a Texas variant of quasi quoting existed, and that it was several [ and matching ], it might entail confusion if the contained expression also used the same bracket characters (human confusion, not parser confusion) so it might be nice to allow users to use as many multiple brackets as they liked.

patrickbkr commented 8 years ago

Just an outsiders opinion who is skimming over this. Most other Perl6 constructs put meta information like the type or signature up front (functions, variables, classes). Thus for reasons of syntactic harmony I would prefer {{{Q::* @ op}}} over {{{op @ Q::*}}}.

masak commented 8 years ago

@patzim, it's interesting that you point this out... because this issue thread has essentially concluded the same thing. And not just that it's traditional, but that it's necessary to front-load the Q type information in an unquote.

Reading the thread more thoroughly will give the whole story, but the short version of why it's necessary is that we need that information ASAP in order not to head down wrong parse paths, especially considering that some parses have side effects such as declaring variables and routines.

The main contender now is {{{Q::* @ expr}}}, but I may or may not also decide to try spelling it unquote(Q::* @ expr) in order to see if this will help get rid of the triple curlies in a safe, uncontroversial way.

Oh, and by the way: 007 will probably take a leaf from TypeScript and allow type information on variables etc. with the following syntax: my var: Int; Though I believe you're right about putting meta information like types or signatures up front, this would be a case where we put it after the variable name (though before the optional assignment). In this case, there's no corresponding harm to it as with the unquote case, and so we get away with it.

raiph commented 8 years ago

Will the unquote type info in vendethiel's translation of the macro from Graham's "On Lisp" be elided in practice when 007 informed features are back ported to Perl 6? In other words, will this work:

macro cont-defun(Q::Identifier $name, Q::ParamList $parms, Q::Block $body) {
  my $new-name = $name.scope.declare("cont-$name.bare()");
  quasi {
    macro unquote($name)(unquote($parms)) {
      quasi {
        my $f = unquote(unquote($new-name));
        $f(unquote(unquote($parms)));
      };
    };
    sub unquote($new-name)(unquote($parms))
      unquote-unhygienic($block); # mmh... needs that "unhygienic" so that `$parms` are visible.
  };
}

?

masak commented 8 years ago

Will the unquote type info in vendethiel's translation of the macro from Graham's "On Lisp" be elided in practice when 007 informed features are back ported to Perl 6?

Maybe in specific cases, but not in all cases. Here's why:

The Perl 6 parser needs to know exactly what state it's in — "state" as in "state machine", the conceptual graph that underlies the grammar we're parsing on. Concretely, if we have the expression

-42

The parser went through three states along the way: "expect statement/term/prefix", "expect term" and "expect infix/postfix/end-of-statement". (This description is handwavy to some extent.)

Now, think of the above as being inside a quasi block instead, and inject something instead of the prefix:<->:

{{{p}}}42

The parser will still need to know what state it's in after the {{{p}}} thing, but in a non-statically-typed language like Perl 6, there might be no information at all about p. For all the parser knows, p might end up being a prefix operator, a term, or an if statement. Two out of those three alternatives should lead to a syntax error. Not constraining the type leads to a situation where the parser simply cannot continue because it doesn't know what state it's in.

In 007 currently, not constraining the type of an unquote means that it's a Q::Term. I could see that being made slightly richer and more clever with time, as we find "obvious" defaults on various places. (For example, if you unquote right after a macro or a sub, what we expect there should be a Q::Identifier no matter what.)

But in a case such as {{{p}}}42, there really isn't enough information for the parser to continue, and so it needs to be {{{p @ Q::Prefix}}}42 in order to work.

vendethiel commented 8 years ago

Er, there's actually a way around this, but this one is a case where the cure is poisonous... quasi could delay parsing and consider {} always match...

vendethiel commented 8 years ago

Looks a lot more like string-based EVAL, but at compile-time. Pretty much what D's mixin!s do.

masak commented 8 years ago

Er, there's actually a way around this, but this one is a case where the cure is poisonous... quasi could delay parsing and consider {} always match...

Yes, but I'm pretty sure that option is already off the table with Perl 6 (and 007). Reason being, if the parser glosses over what state it's in, it might stop counting balanced parentheses and brackets right, it might not realize it's inside a comment, and it will just muck things up in general. It might not even catch that the quasi block ends, or get a false positive.

The Perl 6 dictum is "Always know what language you're in".

The wonderful horror example of not doing this that I always remember is from Perl Best Practices, I think. It goes something like this: // regex literals can use the /x flag, making it possible to put whitespace and even comments inside of the regex code. But if you put an innocuous slash inside the comment, the parser will think that slash is the end of the regex, because it doesn't know what language it's in. In the end this is because the /x information comes at the end of the regex, and so the Perl parser doesn't have that information before it scans ahead and finds the final /, and so it's forced to do two-pass parsing. (This is what led Perl 6 to put those modifiers at the beginning, so that the parser can know what language it's in when it's parsing the regex. It also made /x a default that you can't turn off, but that's beside the point.)

Checking with Amazon's "Look Inside" feature, I see the example on page 244, under a section called "Brace Delimiters".

Besides "no, we can't do that", one of the benefits of quasi quotes compared to just messing with strings is supposed to be that we get the normal syntax consistency check from it. This is in analogy with eval BLOCK in Perl 5, which has benefits over eval EXPR because the block gets parsed normally at compile time.

eritain commented 8 years ago

Thinking aloud about the concepts that macro users need to understand, since the notations for quasi and unquote ought to be mnemonic for those concepts (hard), not just less ugly than {{{}}} (easy).

First off, ugh: Wikipedia's definition of quasiquotation has negative utility for me. I knew roughly what Perl 6 quasis and unquotes were before I read it, and I did not know it afterward -- to say nothing of quasiquotation in logic. We cannot use that, or anything like it, to explain macros to users.

The reformulation "a neat way to build code, some of [which] comes in as parameters" didn't do much to restore my knowledge. I got that already from the fact that we're talking about macros.

I'll get back to the 007/Perl 6 application momentarily, but first I want to visit the Stanford Encyclopedia of Philosophy and pick up the meaning of quasiquotation in general:

Corner (Quasi) Quotation

As Quine (1940, §6) notes, the quotation ‘(μ)’ designates only the specific expression therein depicted, containing a specific Greek letter. In order to effect reference to the unspecified expression he introduces a new notation of corners (namely, ‘⌜ ’and ‘ ⌝’). So, for example, if we take the expression ‘Quine’ as μ, then ⌜(μ)⌝ is ‘(Quine)’. The quasi-quotation is synonymous with the following verbal description: The result of writing ‘(’ and then μ and ‘)’ (Quine 1940, p. 36).

OK then. True quotation creates a metalanguage term that denotes an object language term. The metalanguage term includes the opening and closing quotation marks, and the object language term goes in the body of the quotation as a literal. (It's a little reminiscent of lambda expressions, but instead of creating a function right where you use it by showing what it does, quotation creates a term right where you use it by showing what it denotes.) Compare to Perl's single-quoting: You create a Perl term denoting a string-in-memory by using quotation marks and putting a literal between them.

Quasiquotation, as it's done in the Quine example, is a lot like Perl's double-quoting. ⌜(μ)⌝ denotes the same as ‘(Quine)’ only because we know μ is a metalanguage term that denotes the same as ‘Quine’. In Perl, "($μ)" denotes the same string as '(Quine)' just when denotes the same string as 'Quine'. But Perl at least has sigils between the double-quote marks, so that we can tell which parts are Perl-level expressions and which are to be taken literally as parts of the string. This version of quasiquotation just expects us to know which is which. That is, it doesn't have unquote, and Perl double-quoted strings do.

(Hello again, lambda expressions. You're determined to haunt me today, eh? Very well. In my best judgement, you're more like the Perl double-quotes than the Quine example. In the body of a lambda, we always know what is a metalanguage term in need of substitution, because everything is, except abstraction and application and the parentheses that support them. Now, my second-best judgement says no, you're like the Quine example, because "if we take the expression ‘Quine’ as μ" is a binding, but that's irrelevant. The crucial thing for me is that I don't have to look away from the Perl expression to know which parts of it I have to substitute.)

So now let's talk about macros specifically. The foregoing makes me glad that we're planning to have an explicit unquoting operator.

It's tempting to think that source is the metalanguage and AST is the object language. You do that, you end up thinking that what we've called 'quasi' is actually unquote, and what we've called 'unquote' is implicit and doesn't really fit into the classification. Nope nope. Rather, I think compile-time code is the metalanguage, and run-time code is the object language. The macro is compile-time code that defines run-time code. The quasiquotation is a compile-time construct (one of several that can create run-time code in a macro); its body is semi-literally the run-time code you wish to produce, except with compile-time language in it (marked by unquotes); the run-time code it defines has those compile-time instructions substituted into it.

(The fact that you put source into a quasiquotation and get AST out is distracting, but remember that the same is true of everything you write for compile time. Or run time for that matter.)

The analogy with double-quoted strings suggests using a doubled something for quasiquotes. The closure/hygiene factor suggests something like braces. Among the Unicode matching brackets, left and right white curly brackets ⦃⦄ stand out. (In source code, ⦃⦄ if that looks any different.) Texas versions might be{{ }}, {[ ]}, or {| |}.

The same analogy suggests making the unquote mnemonically similar to interpolating expressions. Alas, Perl 6's qq interpolation is way more complicated than Perl 5's, and I don't know my way around it yet, so I can't offer much of a good proposal there. It appears to still be mostly oriented around sigils. (Here insert strained analogy between sigils in strings and Q::* types?) It also has the "interpolate arbitrary expressions with curlies" device, but since curlies suggest scope and closure that might be a bad idea.

eritain commented 8 years ago

And if quasis are interpolating double-quotes, vanviegen's JS

Q.codeToNode({ my ($x, $y) = $y, $x; }).subst( '$x' => $a, '$y' => $b, );

is sprintf.

masak commented 8 years ago

Thinking aloud about the concepts that macro users need to understand, since the notations for quasi and unquote ought to be mnemonic for those concepts (hard), not just less ugly than {{{}}} (easy).

Hi @eritain, and welcome to the conversation!

First off, ugh: Wikipedia's definition of quasiquotation has negative utility for me. I knew roughly what Perl 6 quasis and unquotes were before I read it, and I did not know it afterward -- to say nothing of quasiquotation in logic. We cannot use that, or anything like it, to explain macros to users.

I hear ya. This was @raiph's point also, to a large extent. I happen to disagree on specifics (not listed here), but I think I can also see areas where we overlap and agree. For example, I'm not going to require that our explanations and documentation use the concept of quasiquotation heavily. We can find other metaphors and analogies that carry people along.

The reformulation "a neat way to build code, some of [which] comes in as parameters" didn't do much to restore my knowledge. I got that already from the fact that we're talking about macros.

Let me try again, then. I was thinking about this after I read what you wrote here.

A programmer builds code in order to create some semantics. The semantics is the desired product, not the code. The code is the vehicle; the semantics is the desired destination. (To a first approximation. When we start growing on various axes — project length, new/changing requirements, group size — the code becomes important in its own right, and its consistency, quality, and maintainability start to matter.)

Even a general-purpose language has a very limited set of language constructs for putting together the desired semantics. As we grow more familiar with the problem domain, the language, and programming techniques, we are able to better clothe our semantics in purposeful code. But — and this is maybe the main point — in an immutable language with a closed set of constructs, we'll reach an upper limit where we're left with a number of "wouldn't it be nice if I could..." desires, and no language mutability to pick up the slack.

This manifests as various "problems" with the code itself. Useless repetition. Scattered code (the same concern distributed over several unconnected/unrelated code locations). Tangled code (several unrelated concerns competing for the same code location). A backwards or disjointed narrative due to imposed language constraints.

In a mutable language, where the set of constructs can be extended, tweaked, and modified, these problems can be addressed.

Let's get concrete. A compiler is a (frontend) parser plus a (backend) code generator.

[code] ---parse---> [AST] ---codegen---> [code]

There's more to it than that, but this definition will work for us for now.

When we say that the language is mutable and the set of constructs is open, we're basically saying that the end-user programmer (who is writing a normal 007 program, in this case) has access to the parser. The parser is the open thing.

Macros, at their simplest, do AST transformations: they accept an AST fragment (called "Qtree" in 007) and spit one out.

[code] ---parse---> [AST] ---macro-expand---> [AST] ---codegen---> [code]

(The macro expansion often interleaves with the parsing. It's more helpful to think of the "parse" and "macro-expand" steps as coroutines running together, like shell pipes.)

The macro expansion picks up the kind of slack that only wants to change semantics. When syntactic slack needs picking up, we also extend or modify the "parse" step, by adding is parsed traits to our new constructs:

[code] ---parse/`is parsed`---> [AST] ---macro-expand---> [AST] ---codegen---> [code]

The regular parser is like canon, and the is parsed is like fanfic. Together they form the current narrative that is reading your program. When macros and slangs all interact together, they thread through the program like a braid, with strands of parsing joining and leaving as needed. (See #176 for a recent example.)

We can identify two roles in this activity. There's the programmer, the end-user of the language. This role does not necessarily go near macros and slangs, though they appreciate the added syntax and semantics that come from these, maybe conscious of them being macros and slangs, maybe not. An adroit end-user programmer may declare the occasional custom operator, as a subroutine.

The other role is the macro or slang author. This role has seen a need, a way that the language currently falls short, and wants to bridge the gap with a language extension of some sort. This role is, or wants to be, much more able than the end-user programmer. The responsibilities of this role are also greater. There is terminology to learn, and techniques to master. It's possible to make whole new classes of mistakes that don't happen during ordinary programming. In many ways, this role is partway towards language designer, because when you're extending the language, you're designing a new part of it.

(It's a little reminiscent of lambda expressions, but instead of creating a function right where you use it by showing what it does, quotation creates a term right where you use it by showing what it denotes.)

Yes, though on a different level. A lambda is something that you apply, passing it argument(s). A quasi is something that you interpolate, passing it ASTs. Runtime act vs compile-time act. The analogy is there, however. It also extends to many kinds of templates, which can be seen either as "here are the constant parts, here are the parametric parts", or as functions/lambdas that sit around waiting for the parametric parts in order to spit out a constructed whole.

Quasiquotation, as it's done in the Quine example, is a lot like Perl's double-quoting.

Yep. Which, again, are a small templating language.

The crucial thing for me is that I don't have to look away from the Perl expression to know which parts of it I have to substitute.

I.e. the ideal here is a kind of local information, where (say) moving an expression to another place in the code shouldn't drastically change what it does.

It's tempting to think that source is the metalanguage and AST is the object language. You do that, you end up thinking that what we've called 'quasi' is actually unquote, and what we've called 'unquote' is implicit and doesn't really fit into the classification. Nope nope. Rather, I think compile-time code is the metalanguage, and run-time code is the object language. The macro is compile-time code that defines run-time code. The quasiquotation is a compile-time construct (one of several that can create run-time code in a macro); its body is semi-literally the run-time code you wish to produce, except with compile-time language in it (marked by unquotes); the run-time code it defines has those compile-time instructions substituted into it.

On first reading this morning, I didn't understand this so well. When reading it now, I think we're agreeing via my parsing/codegen diagrams above.

The analogy with double-quoted strings suggests using a doubled something for quasiquotes.

This is the first "structured" approach to finding a strangely consistent syntax that I've seen (or at least recognized as such). Kudos.

We have to be careful with how heavily we want to put our weight on the "qq string" analogy, though. It's an analogy that is able to carry us in some cases, but it will break in other cases.

The closure/hygiene factor suggests something like braces. Among the Unicode matching brackets, left and right white curly brackets ⦃⦄ stand out. (In source code, ⦃⦄ if that looks any different.) Texas versions might be{{ }}, {[ ]}, or {| |}.

I was reading this on my phone this morning. An isolated data point is that neither ⦃⦄ nor ⦃⦄ rendered on the phone. (Year-old HTC One with up-to-date OS.) Besides the emotional/irrational component, this is my biggest objection to using Unicode braces for anything macro-y. Another big factor is that no matter how much we educate people, there will still be a large percentage who will not be able to produce these with their key mappings/editor. We're leaving them with the unhappy choice of either copy-pasting or going to Texas.

Re {{ }} and {[ ]}, both of those are much too close to code that could actually be written for various normal, non-macro reasons. (More so in Perl 6 than in 007.) {| |} might fly. Don't get me wrong, but it gets more of a shrug from me than a relieved smile.

At present, quasi { } remains in position for quasiquotation. It's uncomplicated, it's ASCII-only, it doesn't need digraphs, and it evokes block/scope-like thoughts in the user, which is more-right-than-wrong. (Even when we don't mean "block", we do mean "scope", thanks to hygiene.) I know I'm stubborn, but I also want to try to keep open-minded. The debate remains open and everything is on the table.

The same analogy suggests making the unquote mnemonically similar to interpolating expressions.

This is where I think Angular/React/Ember template placeholders make for a better analogy than qq strings. React uses { }, Angular/Ember use {{ }}. We use {{{ }}}. All of these mean "parametric hole here".

Alas, Perl 6's qq interpolation is way more complicated than Perl 5's, and I don't know my way around it yet, so I can't offer much of a good proposal there.

Though note that (even though you're right about "mostly oriented around sigils") Perl 6's qq interpolation has elected to use { } to mean "insert parametric expression here". So has Python's .format, in a way.

It appears to still be mostly oriented around sigils. (Here insert strained analogy between sigils in strings and Q::* types?) It also has the "interpolate arbitrary expressions with curlies" device, but since curlies suggest scope and closure that might be a bad idea.

The idea to use sigils for the unquote mechanism and/or for Qtree values themselves has been up for discussion. Championed by TimToady, even. His proposal was to use ¤ (00A4 ¤ CURRENCY SIGN), which is on my Swedish keyboard, and likely reachable on many other keyboard layouts too.

I used to strongly object, but I might've warmed up to it a bit more after seeing lots of other quasi/unquote syntax suggestions that were not better. :smile: The biggest problem right now I guess is that (a) it's not a fully-fleshed-out proposal, and (b) it's not clear how it helps solve the current actual problems that the quasi { } and {{{ }}} syntaxes have, notably how to indicate grammatical category, which is the big one.

masak commented 8 years ago

And if quasis are interpolating double-quotes, vanviegen's JS

Q.codeToNode({
    my ($x, $y) = $y, $x;
}).subst(
    '$x' => $a, 
    '$y' => $b,
);

is sprintf.

Just for completeness, I nowadays believe that the codeToNode/subst idea is a dead end. See the last two points in this feedback post. The thinking behind that subst is that code is text, not AST. I think there isn't enough unhygiene-by-default in Perl 6 to nicely pull it off.

masak commented 8 years ago
  • The thing that looks like a block isn't actually a block. It's a quote that happens to use the same delimiters as blocks. This is very misleading, imo.

I wanted to come back to this, because in bf44fc7b4d1189ed9657bfbc996e7e505cf31da5 I fixed a bug that occurred because the { ... } delimiters were not block-like enough.

Here was the use case that triggered the bug:

my q1 = quasi @ Q::Statement { my x; };
my q2 = quasi @ Q::Statement { my x; };

Of course those two lines are fine and don't collide at all. We see there are two my x; statements, but (a) they're not at all on the same level, they're each couched within a block, and (b) they're both quoted. Both of these factors make it feel like there shouldn't be a redeclaration error in this case. The bug caused a redeclaration error.

For the exact same reasons, we wouldn't expect this to work:

my q = quasi @ Q::Statement { my x = "Bond"; };
say(x);

Because again, the my x; has happened in quoted code, and quoted code shouldn't declare things in the mainline. The bug made this code work (and print None, not Bond).

Anyway, I feel I now have a real example on which to firmly disagree with the assertion that "The thing that looks like a block isn't actually a block."

It is a block, even in cases where it encapsulates something that is syntactically a smaller unit than a block, such as a statement (or a statement list). Our expectations on scoping, blocks, hygiene and all that good stuff guide us to expect no leakage of the sort the bug caused. The lack of leakage is due to it being an actual block — on the declaration side of things, that is, in the quasi term.

eritain commented 8 years ago

For concreteness, could you give me a couple of use cases that clash with the qq string metaphor?

Is there a minimal example of the quasiquotation grammatical category problem?

The textual underpinnings of the subst approach are grotty, no doubt about it. But the spirit of it might translate to a more disciplined, ASTy approach. Start a quasiquote with code that's literal except for marked insertion sites: stubs that are grammatically a term, a prefix operator, a statement, or what have you. They have no content, but they have a label (or maybe a serial number). And then elaborate afterward on what (compile-time-generated) content goes into each marked spot.

(Note to self: Nose around for templating languages that assemble tree structures.)

masak commented 8 years ago

For concreteness, could you give me a couple of use cases that clash with the qq string metaphor?

qq strings are cool because they offload the mental burden of string concatenation. When you see a qq string like this:

"Cat number $n is $description."

...the brain doesn't attempt to spot the concatenation of the five parts ("Cat number ", $n, " is ", $description, ".") — it sees it as a single undivided string. I hesitate to say "constant", because I think most developer brains wouldn't really slumber that hard at the wheel. But the useful lie here is that mechanism is hidden, replaced by declarative intent. Those interpolated variables are there in the string. But we don't care all that much about the concatenation itself.

As a parenthetical remark, I tend towards using this style in Perl 6 nowadays, because I can:

"Cat number {$n} is {$description}."

Partly because it makes the interpolations stand out a bit, which is nice. Partly because it actually insulates against some semi-nasty thinkos, especially in HTML/XML where the character right after the interpolation is likely to be <.

To summarize, the nice thing about qq strings and interpolation is that they allow the writer to express (and the reader to read) a single coherent intent, uninterrupted by noise in the form of mechanism.

Up to this point, the analogy holds up, and is pretty great. That's exactly what we want to do with quasi blocks as well, present code as code as coherently as possibly, uninterrupted by noise in the form of splicing/tree transformation mechanism:

if !{{{condition}}} {
    {{{block}}};
}

If you squint, this is "the same" type of interpolation as qq interpolation. We need a meaner way to signal "make a hole here", mostly because code is varied enough to lay reasonable claim to simpler syntaxes. But the intent is clearly similar to qq strings.

Another thing that's similar is, if you hit the same interpolation site several times — in a loop, say, or by a sub/macro being called multiple times — the values of the interpolated variables might change, and so the value of the resulting string/quasi might end up different each time. Again, we're fine with this, especially when reminded of it. By hiding mechanism, the interpolation syntax effectively hides the discrepancy between static program text and dynamic runtime behavior. But it's right there under the surface if we but scratch a little.

So much for similarities. Now where does the analogy break down?

In 007, code is not text. If it were, we'd be C #defines and we could all go home, job done. Instead, code is ASTs. ASTs are hierarchical. They nest in various ways. Instead of interpolating a linear stretch of string, a tree fragment gets inserted into a bigger tree fragment. Oh, and the parametric holes have shapes; they say "I only accept this type of tree fragment".

It gets worse. A normal (lexical) name lookup is used to locate the tree fragment to be inserted. But the code that gets inserted might also contain references to names (this is likely), which then need to be looked up. Note that this happens on completely different "levels" — one is at macro-call time, the other is at code-run time. There is no such level distinction in qq strings (they're all runtime).

It doesn't have to be just two levels, either. A macro can nestedly interpolate as many code fragments as it wants. All these code fragments can come from completely different parts of the program — sometimes from different compilation units — and the code fragments all contain zero or more lexical lookups of their own.

The hygienic expectation is that name lookups from these code fragments behave as if the code they're in was never interpolated. Again, no analogue exists for qq strings since there is no multi-level lexical lookup going on in those.

If we're looking for something that comes closer than qq strings to behaving the same, lexical closures might be it. Heavily used higher-order functions might give something of a similar feel to heavily used higher-order quasis. I sometimes think of the hygienic expectation as being simply an extrapolated expectation from the lexical closure case. The biggest difference between closures and quasis is that closures only "move" at runtime, eliminating many weird behaviors. quasis are not so lucky; they "move" at compile time. The hygienic expectation makes them pretend they don't.

Is there a minimal example of the quasiquotation grammatical category problem?

Sure.

2 {{{op}}} 2

Now I meant for op here to be an AST fragment for an infix operator. How will the parser know this? It can guess, sure. But how will it guess? Might guessing become a lot harder in the presence of is parsed that can arbitrarily mutate the language? (There's discussion about this earlier in this thread, culminating in https://github.com/masak/007/issues/30#issuecomment-239567652 -- but in summary, the parser is pretty adamant in wanting to know what's going on.)

So to make things simple, we say "if you didn't specify what op was, it's a term of some kind" — "term" being the fewest-assumptions thing and the most common thing to insert. And to write what you wanted, you'd need this:

2 {{{op @ Q::Infix}}} 2

(Syntax negotiable, and negotiated. Again, it's remarkable how well the Q type hierarchy — including inheritance/subtyping — fits to describe grammatical categories. That's certainly not something to be taken for granted.)

Let's say we solved things with a ¤ sigil instead of {{{ }}} syntax.

2 ¤op 2

Well, it's shorter. But it doesn't indicate category, which we have to do. So maybe this:

2 ¤infix:op 2

The colon there is the least disrespectful thing I can think of to salvage the ¤ sigil and make it aware of grammatical categories. But... note how we lost the Q hierarchy, which I'd argue we'll want. We lost an obvious end delimiter like }}}. Sure, you could throw in an extra ¤ at the end, but I don't see how that's an improvement on matching curly braces. Braces nest, ¤ delimiters don't. All in all, even with the grammatical-categories patch, it seems like a different syntax with strictly fewer advantages.

The textual underpinnings of the subst approach are grotty, no doubt about it. But the spirit of it might translate to a more disciplined, ASTy approach. Start a quasiquote with code that's literal except for marked insertion sites: stubs that are grammatically a term, a prefix operator, a statement, or what have you. They have no content, but they have a label (or maybe a serial number). And then elaborate afterward on what (compile-time-generated) content goes into each marked spot.

I could see that working, yes. In fact, I'd very much like to see examples of it. In spirit it reminds me a little bit of Python's .format, in which the "marked insertion sites" are the { }-delimited substrings with formatting instructions.

But I also think in actual practice the {{{ }}} interpolation form is good at emulating even the subst/.format style. You just need to do AST "computations" earlier in your macro, store the resulting AST in a nicely named variable, and then interpolate it into your quasi. Presto: separation.

(Note to self: Nose around for templating languages that assemble tree structures.)

Let me know what you find out!

raiph commented 7 years ago

Hopefully most of the following makes enough sense that you can fairly easily point out how it doesn't. :)

Q1. Am I right that things are currently headed toward, or you've arrived at, concretely speaking, something close to one of the two forms below (abstractly speaking, unquote signal, open delimiter, qtype, deal with TTIAR, code-to-be-parsed-as-qtype, close delimiter)?

{{{qtype @ expr}}}
unquote(qtype @ expr)

Q2a. Am I right that the compiler will check to see that expr evaluates to an AST that corresponds to qtype? Does that checking currently entail type checking or matching the name of an action class method or object matching via a signature or ...?

Q2b. If expr was always of a type known at compile time (so by the time the unquote is reached), then the qtype could, presumably, be optional? (Even if the loss of annotation would, perhaps, be such a strong net negative that it's worth requiring it.)

Q3a. Is there a qtype type that's equivalent to a "noun-clause" (eg an ast corresponding to a prefix-term-postfix sequence would be OK as would anything else that could be an "expression" in the main language)?

Q3b. Do some/all qtypes map to selected ast generating action methods? If not, does it make any sense for the macros effort to take on pushing any abstraction of the ast back in to the actions, i.e. making sure there is this correspondence with action methods?

Q4. Is there a parallel with the default type constraint for an ordinary code "untyped" variable being Mu, and the default value being Any? For example, a default type constraint for an expr (Q?) and a default actual value like Q::NounClause or whatever you're using as a generic that approximates (or equates) to 'term' or perhaps 'term-position'.

Q5. Let's say I have a thousand lines of code and I want to write a macro that takes that code as input and inserts some instrumentation after any use of =, .=, or :=. What would that look like, roughly speaking?

masak commented 7 years ago

Hopefully most of the following makes enough sense that you can fairly easily point out how it doesn't. :)

Let me lead by saying how much I appreciate your probing questions and comments. They mean a lot, and they do push me in the right direction; thank you.

Q1. Am I right that things are currently headed toward, or you've arrived at, concretely speaking, something close to one of the two forms below (abstractly speaking, unquote signal, open delimiter, qtype, deal with TTIAR, code-to-be-parsed-as-qtype, close delimiter)?

{{{qtype @ expr}}}
unquote(qtype @ expr)

You are right in so assuming.

Clarification one: 007 is still stuck on the {{{expr @ qtype}}} form. Only tuits stand in the way of reversing the order to the new consensus. (Update: now tracked as #220.) (Updated update: Now merged.)

Clarification two: I/we still haven't decided between the {{{ }}} and unquote() forms above. We can try them both out if we want. We can switch on the unquote() form with a pragma. (Update: now tracked as #230.)

Clarification three: the technical term for "deal with TTIAR" in the Perl 6 grammar is "stopper". ("Terminator" is also used, but that's more for statements than for expressions IIUC.) If something like stoppers didn't exist, then something like for @values -> $value { ... } would parse a minus sign out of the ->, and then parsefail. (Update: See #331.)

Q2a. Am I right that the compiler will check to see that expr evaluates to an AST that corresponds to qtype? Does that checking currently entail type checking or matching the name of an action class method or object matching via a signature or ...?

Type checking. The semantics is that of ~~, or more exactly subtype matching. That is, you can pass (e.g.) a Q::Term::Array to a {{{Q::Term @ ...}}}, but not vice versa.

There are some exceptions/"cheats" that basically aim to improve the user experience:

Q2b. If expr was always of a type known at compile time (so by the time the unquote is reached), then the qtype could, presumably, be optional? (Even if the loss of annotation would, perhaps, be such a strong net negative that it's worth requiring it.)

There's a fairly evolved type checking story waiting in the wings for 007. It's laid out at https://github.com/masak/007/issues/33 — note, however, the "orthogonally to everything else" in the title of that issue. This implies that (in some sense) if the program would compile-fail with type checking off, then it must also compile-fail with type checking on.

I thought I had given a comprehensive answer to this earlier in the thread, but all I find is bits and pieces of a comprehensive answer here and there. :smile:

Anyway, what you're proposing is of course attractive if pulled off well, but it's also "technically challenging" in the worst possible sense of the phrase. Parsing usually completely precedes type checking; the type checker needs a completely parsed compilation unit to do its job. With what you propose, we would need type information right in the middle of the parse, which is too early.

I'm not saying there is no solution, I'm saying the problem is PhD-worthy at least. I might not have time to wait for a PhD to come along and solve it for us. (Update: More importantly, I'm not willing to risk that extra complexity being built into 007, as that will make it all that much harder to contribute a solution back to Perl 6.)

Q3a. Is there a qtype type that's equivalent to a "noun-clause" (eg an ast corresponding to a prefix-term-postfix sequence would be OK as would anything else that could be an "expression" in the main language)?

Yes, Q::Expr.

At this point, I'm very happy to direct you to the (not yet published) https://github.com/masak/007/blob/06a12d48affa92e37785c560c3cb443c12c69aef/lib/_007/Q.pm wherein all the Q types have been documented. I see Q::Expr is documented simply as "An expression; something that can be evaluated to a value." With your question above as feedback, I can probably document it some more in a way that would have helped you if you had chanced upon that documentation. (Also, all other feedback on that documentation is very welcome.)

I'm still mulling over how best to present the whole Val and Q hierarchy in a way that is "meaty" and informative for the reader, rather than dry and reference-ish. Probably many concrete examples is the way to go.

Q3b. Do some/all qtypes map to selected ast generating action methods? If not, does it make any sense for the macros effort to take on pushing any abstraction of the ast back in to the actions, i.e. making sure there is this correspondence with action methods?

Action methods, as much as I like them, do not yet have an obvious place in the macro story. The best way to appreciate this is perhaps to go to any 007 macro-ideas issue and check out the implementation, which is usually an is parsed regex plus the macro itself.

As far as I can tell, macros replace action methods. At least for things that were not in the grammar in the first place. This kind of makes sense, though. The two purposes of an action method are to do some parser state bookkeeping, and to generate QAST (in the old world) or Qnodes (in the new order). Well, that's two purposes of a macro, too.

I want to stress again that I have nothing against action methods. :smile: Also, that this story is very much still evolving. But, hm, macros as defined in the synopses precede our modern understanding of grammars, and sometimes the two stories don't fit naturally.

Q4. Is there a parallel with the default type constraint for an ordinary code "untyped" variable being Mu, and the default value being Any? For example, a default type constraint for an expr (Q?) and a default actual value like Q::NounClause or whatever you're using as a generic that approximates (or equates) to 'term' or perhaps 'term-position'.

As far as I understand this question: no, there isn't.

007 has exactly one undefined/empty value (None), unlike Perl 6 which has like five of them. None is found both in runtime/value space and in compile-time/Q space (it shares this peculiarity with Array).

Parenthetically: If I could, I would get rid of None and do Maybe<T> types instead. (For why, google "null billion dollar mistake".) But this goes against the dictum that #33 is supposed to be orthogonal to everything.

It feels like you are asking something else here, but I don't really know what.

Q5. Let's say I have a thousand lines of code and I want to write a macro that takes that code as input and inserts some instrumentation after any use of =, .=, or :=. What would that look like, roughly speaking?

...and a Github issue was created for something that was discussed in a gist once and that we've been referencing sometimes in other issues:

https://github.com/masak/007/issues/217

The idea is outlined in https://gist.github.com/masak/13210c51f034f931af0c, but basically we want to be able to "watch" the creation of some given type of Qnode, and basically run a callback at that point to do some processing on our own.

raiph commented 7 years ago

Let me lead by saying how much I appreciate your probing questions and comments. They mean a lot, and they do push me in the right direction; thank you.

Gracious words and you are of course welcome.

(Also, maybe write a comment here? A couple years ago I posted a link to the 007 project but I'm thinking it might work for you and for redditors if you posted a friendly, casual, intimate, macro-enthusiast-to-macro-enthusiast, fresh account of what's been on your mind over the last month or two (and then repeat this casual commenting on /r/programminglanguages every month or two in future).)

Some further comments and loose ends:

{{{qtype @ expr}}} Q2b. If expr was always of a type known at compile time

There's a fairly evolved type checking story waiting in the wings for 007. It's laid out at #33

Duly read.

... if the program would compile-fail with type checking off, then it must also compile-fail with type checking on.

That makes sense out of context but I'm not understanding why you're mentioning it in this context.

What I was getting at was that if, say, expr was one of the macro's arguments, then you could know that its type is as declared in the macro's signature. And if not, you could, presumably, "know" it's at least a qtype. (If it turns out it isn't at run-time the fault lays with the macro code.)

Q3b. Do some/all qtypes map to selected ast generating action methods?

As far as I can tell, macros replace action methods.

Yeah. I'd gotten confused. Now I've (re)read the Three Types of Macro I'm less confused. :)

Parenthetically: If I could, I would get rid of None and to Maybe types instead.

(Tangentially, have you read this wikipedia write up about Perl 6 types?)


I hope to get time in the next week or so to (re)read and give feedback on the q types doc and the three types of macro in their respective comments areas.

masak commented 7 years ago

(Also, maybe write a comment here? A couple years ago I posted a link to the 007 project but I'm thinking it might work for you and for redditors if you posted a friendly, casual, intimate, macro-enthusiast-to-macro-enthusiast, fresh account of what's been on your mind over the last month or two (and then repeat this casual commenting on /r/programminglanguages every month or two in future).)

Sounds like a great idea.

I'll try to find time for it. Generally, I find that taking the time to explain 007, including finding examples with good traction, benefits everybody. The problem, as usual, is finding the time.

... if the program would compile-fail with type checking off, then it must also compile-fail with type checking on.

That makes sense out of context but I'm not understanding why you're mentioning it in this context.

What I was getting at was that if, say, expr was one of the macro's arguments, then you could know that its type is as declared in the macro's signature. And if not, you could, presumably, "know" it's at least a qtype. (If it turns out it isn't at run-time the fault lays with the macro code.)

The issue is with making the parser suddenly succeed some previously failed programs based on type information. That's what I was getting at. It's not that it can't be done, it's just that that wouldn't make the type annotations "orthogonal to everything else".

(And that's before considering the (much more serious) timing issues I mentioned between parsing and type inference.)

Parenthetically: If I could, I would get rid of None and to Maybe types instead.

(Tangentially, have you read this wikipedia write up about Perl 6 types?)

I hadn't. I'm surprised at the claim that Perl 6's type objects would be in some way equal to the Option type. To me type objects are much closer to a null reference, with all of the associated problems. I can see how there's a superficial resemblence, though.

The piece I feel is missing from Perl 6 to qualify — and it's missing because Perl 6 simply isn't that kind of language — is forcing you to consider undefined values during compile time. That is, if your type allows type objects (as most do), then you simply cannot handle the value without also taking the type object case into account. That's not Perl 6's approach at all; it gives the programmer the freedom to ignore the null case and/or the responsibility to remember it when it matters.

raiph commented 7 years ago

Generally, I find that taking the time to explain 007, including finding examples with good traction, benefits everybody. The problem, as usual, is finding the time.

I hear that. I think you've misunderstood what I was suggesting and/or its possible consequences. This is well off topic so I'll follow up about this elsewhere than this github issue.

The issue is with making the parser suddenly succeed some previously failed programs based on type information.

OK. My "duly read" of #33 didn't mean I understood any of what I was reading. :) I've posted a question there in an attempt to catch up.

I'm surprised at the claim that Perl 6's type objects would be in some way equal to the Option type.

I saw hints that you wouldn't agree with it (which is why I mentioned it) but I'm surprised you're surprised. But this is way off topic so I'll follow up elsewhere.

masak commented 7 years ago

I recently switched on typechecking of all properties initialized in the objects we're creating in 007. It makes everything run a lot slower, but it's led to some interesting insights along the way.

I went through all of our types, including all our Qtypes, and wrote the allowed types for all their fields. The result is here. Most of these are extremely straightforward; the type of Q::Dict's propertylist is Q::PropertyList, for example. Some of these are optional, which in effect means that the value None is allowed in that field. I solved that by annotating these with (fake) union types; for example, the name field in Q::Term::Sub is typed as Q::Identifier | NoneType, because a sub term doesn't have to have a name. (I might come back and change that to being an :optional property on the field object instead, but the end result will be the same. We'll keep using None as "the absence of some actual value here"; that's what it's for.)

The most complicated one we have is the else property in Q::Statement::If, which is Q::Block | Q::Statement::If | NoneType. The three types represent, respectively, an else block, an elsif, and neither of those.

I got two test failures from doing this. One was from format.007, bless its heart. (It's to date our most realistic use of macros in examples/.) It made me have to put in Q::Unquote in the type union on this line.

That Q::Unquote is needed because of this quasi in format.007.

It immediately felt wrong to have to put a Q::Unquote in that type union. The "steady state" of starting to do that would be basically to put Q::Unquote almost everywhere in Qtype fields. There's something wasteful and repetitive about the whole idea. It indicated to me that I was missing something.

On the other hand, putting those Q::Unquote in was what I had to do, in the short term, to make creating those objects work again.

This was a couple of days ago. I've spent those days (and a long-haul flight) pondering over what to do about this. I've found out the answer, and it feels like it also answers a bunch of unrelated questions we've had about quasis and unquotes. ("The Answer Will Shock You"™)

To be continued.

masak commented 7 years ago

It's worth spending some time at what I thought was a right solution, but which turned out to be a wrong one. There's value in postmorteming things that don't pan out.

A fairly natural response to the fact that unquotes don't "fit" in the Q::Expr-shaped hole in operand in Q::Postfix is "ok, then we make it fit". Maybe Q::Unquote needs to subtype Q::Expr?

That doesn't sound exactly right — Q::Unquote would need to subtype most Qtypes in order to fit everywhere. It would need to be some kind of bottom-ish type.

A more promising idea might be to think of Q::Unquote as being generic in the Qtype it eventually spits out. (Cf #182.) In other words, make it a Q::Unquote<T extends Q>.

But... um. If the goal is still to make a Q::Unquote-shaped peg fit in a Q::Expr (or whatever)-shaped hole, then Q::Unquote would need to extend its generic type parameter:

class Q::Unquote<T> extends T {
    # ...
}

Ew. I mean, it'd be sort of awesome if that worked, but... I'm not aware of a language that does that, at least not this side of the FP fence. I'm 100% Java disallows it, 'cus type erasure. I'm reasonably certain C# wouldn't like it either. When I try in TypeScript, it informs me I'm trying to use a type as a value, which sounds about right. (Edit: These days it just says "Cannot find name 'T'".)

Ultimately, both the generic-types lead and the preceding extend-everything lead are wrong, by this simple demonstration: is a Q::Unquote a kind of Q::Literal::Int? By assumption, yes. Well, what characterizes a literal int? It has a value property with type Int, denoting the literal value of the literal int. What's the value of a Q::Unquote<Q::Literal::Int>? There can't be any, because an unquote is not a literal int yet. Barbara Liskov's ghost appears and gives us a stern look, and we realize our assumption is wrong — Q::Unquote is not a type of Q::Literal::Int, even when it ultimately results in one.

I had other ideas along the way, like maybe I could have a separate Qtype hierarchy for Qtypes that can appear in quasis. Not so appealing either.

To be continued.

masak commented 7 years ago

This summer I was writing code for a client related to HTML templating. The templating engine was Handlebars, which is a fairly reasonable one, allowing the template author to put directives around elements for for loops and if statements and the like.

By some kind of comfortable default, we were also making use of a WYSIWYG HTML editor. Actually, it wasn't just HTML, but XHTML, that useless extra strictness that people think they need but never really do. The editor was helpfully "fixing" illegal HTML in the templates, essentially creating a conflict of interest between the WYSIWYG editor and the Handlebars template. In particular, a fairly common thing to want to do in Handlebars is to {{#each}} some <tr> table rows. But the editor (correctly) noticed that there was some textual content inside a <tbody> but outside of a <tr>, and moved that text into its own <p> outside of the table. The template broke.

It took me a few days to distill what was wrong down to the surprising essence: Handlebars is not HTML. Handlebars is one level of indirection away from being HTML. It's unexpected because Handlebars kind of looks like HTML... just like an IKEA instruction for how to assemble a sofa kind of looks a little like a sofa — but it's not a sofa.

(We ended up disabling the WYSIWYG editor. It only works for HTML, after all.)

Generalizing, a templating language for some language X is not X. What it is, is one level of indirection away from X. It's X but minus a function application. Now, let's apply that to quasis.

To be continued.

masak commented 7 years ago

Skipping right to the conclusion:

A quasi contains a template for 007 code, which is not 007 code. There's no (good) way to encode 007 code as a Qtree, because there are freaking holes in it. The unquotes are not made of Qtree, they are made of hole, which does not encode well as Qtree.

How should we encode the quasi? Going back to the IKEA instruction, a quasi is an assembly instruction. It's a functional mapping from zero or more ASTs, to an AST. It's... an IIFE. A sub declaration and a call to that sub. The parameters and the statements of the sub is what we want to encode.

The way this solves the problem of "need to put Q::Unquote everywhere" is that the Qtrees don't actually get constructed until the unquotes are replaced by actual ASTs. All we ever get in 007 land is real, honest-to-grog Qtrees.

To be continued with one last installment.

masak commented 7 years ago

The "quasis are templates are functions" interpretation is so obviously correct that I'm a bit dizzy with happiness. I'm glad I did the types thing that made me realize this. (I'm also kinda glad I lost my phone in China, because I've had some wonderful time to focus when I haven't been distracting myself with my phone. True story.)

There are some more or less immediate consequences of this:

To be clear, we're talking about this situation:

my name = "outside";
quasi {
    my name = "inside";
    {{{ new Q::Identifier { name } }}}
};

I'm saying the identifier has the name "outside", because (by the time the quasi is fully constructed into a 007 AST) the statement my name = "inside"; isn't 007 code yet. (It could be in this case, but it can't be in the general case.)

Even if we did allow the above — and I really don't see why — there would be fatal timing issues involved. The unquote gets evaluated at macro time. The quasi code gets evaluated Late. Even if we BEGIN'd the quasi code, after #216 gets fixed, it'd still run too late. So there's no reasonable way an unquote could benefit from seeing the inside of a quasi.

I'm tempted to make this one an error message, even. If we don't find a binding for the variable outside the quasi, but there's one inside, then we should emit a polite version of "what is it you think you're doing?".

Ok, I'm done now.