We are hitting the limits of the Perl 6 grammar parser

masak / alma

ALgoloid with MAcros -- a language with Algol-family syntax where macros take center stage

Artistic License 2.0

139 stars 15 forks source link

We are hitting the limits of the Perl 6 grammar parser #293

Open masak opened 6 years ago

masak commented 6 years ago

I think we should (experimentally, in a branch) rewrite the parser from scratch, with two main goals:

Excellent error messages (fixing #10)
Parser extensibility; the ability to register a new statement type just as easily as a new operator

We don't have to make the new parser feature-complete at first. Just enough of a proof of concept for the above two ideas.

masak commented 6 years ago

Thinking about this a bit further, the parser would need to have these features:

An internal DSL for regex matching. Should conform closely to the regexes in 007.
A way to clone a parser (when entering blocks).
A way to add a new rule alternative to a category/protoregex.
A way for a used-up parser to hand control back to its outer parser (when exiting blocks).

The first feature would subsume our current grammar. The remaining three will replace OpScope, but also features we don't have yet such as #177 and statement macros.

masak commented 6 years ago

Oh! And I forgot another biggie:

A way to clone a parser but into a template parser, where for each grammatical category the parser will also accept a {{{ ... }}} unquote of the right type.

The above feature is something I've envisioned several times in the past. This would be the time to try mapping it out.

Note that a template parser is not a static thing; it's derived from the language the quasi is in. As far as I know, that makes quasis a bit of a weird slang, maybe a "higher-order slang" or something.

masak commented 6 years ago

Even though it's not a core feature in itself, it wouldn't hurt if we also took a holistic approach to error messages from the parser, and essentially solved #10 in the process.

masak commented 6 years ago

One thing I've realized as I've been thinking about this is that the old parser/actions did a bit too much, and those things we can shunt out as post-processing steps to a simpler parse.

Examples of this so far:

Parsing operators: the parser itself can just parse an expression "flat", and then a later step can create the expression tree out of the precedence/associativity information. (Extra relevant because quasi blocks will need to be parsed "flat", but we don't necessarily have all the operators yet and so we can't do the later step.)
~~Annotations on statements.~~
Something like #167.

masak commented 6 years ago

Should also be able to displace rules, as in #421.

masak commented 6 years ago

Parsing operators: the parser itself can just parse an expression "flat", and then a later step can create the expression tree out of the precedence/associativity information. (Extra relevant because quasi blocks will need to be parsed "flat", but we don't necessarily have all the operators yet and so we can't do the later step.)

We've decided not to go down this path. Instead, unquoted operators will always have the loosest possible precedence by default. (We might also think up a syntax for setting the precedence.)

Annotations on statements.

Similarly, here, I don't think this is the resposibility of a post-processing step. We should find some way to do this within the parse.

masak commented 6 years ago

I think this one urgently needs a spike/prototype. Just something where it can be proved we can add a rule during the parse would feel like a win. (Update: Started with #485 and is-parsed-spike.)