xsawyerx / guacamole

Guacamole is a parser toolkit for Standard Perl. It provides fully static BNF-based parsing capability to a reasonable subset of Perl.
https://metacpan.org/pod/Guacamole
20 stars 8 forks source link

block and hash disambiguation #2

Closed vickenty closed 4 years ago

vickenty commented 8 years ago

This happens when a pair of curly braces is used as a stand-alone statement in a code block (sub, eval, etc), or as the first argument to one of the operators below:

(This is not related to prototypes, these operators are always parsed using special rules, even if parenthesis are used around the arguments. In expressions, after return keyword curlies are always treated as a hash literal.)

While a sufficiently powerful parser can probably handle this, rules used for this disambiguation are rather unique and would complicate the parser too much. The disambiguation rules are also not documented in full. In brief, perl checks if there's a comma right after first thing inside the braces (full disclosure below).

I'd like to make curlies always interpreted as a block:

Several possible solutions:

  1. Require a semicolon after opening brace in ambiguous situations.

    map { $_ => 0 } @a; # not ok
    map {; $_ => 0 } @a; # ok
    sub foo { { 1 => 2 } } # not ok
    sub foo { return { 1 => 2 } } # ok
    sub foo { 1 => 2 } # ok
    { my $x; sub foo { $x } } # not ok
    print { $fh } "hi"; # not ok
  2. Require parens around expressions with comma or fat-comma inside ambiguous blocks.

    map { $_ => 0 } @a; # not ok
    map { ($_ => 0) } @a; # ok
    sub foo { { 1 => 2 } } # not ok
    sub foo { return { 1 => 2 } } # ok
    sub foo { 1 => 2 } # ok
    { my $x; sub { $x } } # ok
    print { $fh } "hi"; # ok
  3. Require parens around all expressions with comma or fat-comma, if not inside an expression. This is global change, but in return code block syntax becomes the same everywhere (unless I missed anything).

    map { $_ => 0 } @a; # not ok
    map { ($_ => 0) } @a; # ok
    sub foo { { 1 => 2 } } # not ok
    sub foo { return { 1 => 2 } } # ok
    sub foo { 1 => 2 } # not ok
    { my $x; sub { $x } } # ok
    print { $fh } "hi"; # ok

    Disambiguation rules

Exact details are not really important to this issue, but I put them here for reference and entertainment.

Perl parses a pair of curly braces as a hash if one of the following is true:

Here token means a quoted string or command ('', "", ```,q{},qq{}andqx{}`) or a sequence of word characters.

{ } # hash
{ 1 } # block
{ 1, 2 } # hash
{ fuss, 2 } # block
{ Pack, 2 } # hash
{ quiz, 2 } # hash
{ fuss => 2 } # hash
{ qq{} => 2 } # hash
{ qr{} => 2 } # block
vickenty commented 8 years ago

Personally, I like option 3 the best. It is the biggest change to the language, but it makes all blocks parse the same.

xsawyerx commented 8 years ago

Let's embrace the solution then?

vickenty commented 8 years ago

Before we embrace it, I want to prove that this solution works by implementing it in the Marpa parser. I already have code blocks and hash literals implemented, and tests that check for ambiguity in the parser.

xsawyerx commented 4 years ago

I think we resolved this using NonBraceExpression, in #14.

The rule, for posterity, is:

For example, using print {STUFF} MORE_STUFF:

There is still the case of sub foo () { {} } not understanding this is a hashref, but instead thinking it's a block. We can handle this by requiring a return as the beginning of the last statement in a subroutine.

@vickenty did I miss anything?

vickenty commented 4 years ago

This may require some fine tuning later: print {} $b would be parsed by guac, but not by perl. We may need to additionally prohibit empty blocks in ambiguous positions, or something.

We also removed comma and fatcomma as top-level operators in blocks: sub foo { 1, 2 } is not valid syntax. This was done to avoid disambiguation rules in perl that look for comma inside the braces.