Closed vickenty closed 4 years ago
Personally, I like option 3 the best. It is the biggest change to the language, but it makes all blocks parse the same.
Let's embrace the solution then?
Before we embrace it, I want to prove that this solution works by implementing it in the Marpa parser. I already have code blocks and hash literals implemented, and tests that check for ambiguity in the parser.
I think we resolved this using NonBraceExpression
, in #14.
The rule, for posterity, is:
For example, using print {STUFF} MORE_STUFF
:
print
uses NonBraceExpression
as the first argument. This means that {STUFF}
will only be a block, never a hashref.NonBraceExpression
supports Unary
operator, the statement print +{...}
will still work correctly as hahsref.sub foo () { {...} }
will always view the last {...}
as a block.return {}
will work because return
accepts any Expression
, not just NonBraceExpression
. (It also doesn't accept a Block as an argument, so it will not accidentally misparse it.There is still the case of sub foo () { {} }
not understanding this is a hashref, but instead thinking it's a block. We can handle this by requiring a return
as the beginning of the last statement in a subroutine.
@vickenty did I miss anything?
This may require some fine tuning later: print {} $b
would be parsed by guac, but not by perl. We may need to additionally prohibit empty blocks in ambiguous positions, or something.
We also removed comma and fatcomma as top-level operators in blocks: sub foo { 1, 2 }
is not valid syntax. This was done to avoid disambiguation rules in perl that look for comma inside the braces.
This happens when a pair of curly braces is used as a stand-alone statement in a code block (sub, eval, etc), or as the first argument to one of the operators below:
print
,printf
,say
system
,exec
sort
,grep
,map
(This is not related to prototypes, these operators are always parsed using special rules, even if parenthesis are used around the arguments. In expressions, after
return
keyword curlies are always treated as a hash literal.)While a sufficiently powerful parser can probably handle this, rules used for this disambiguation are rather unique and would complicate the parser too much. The disambiguation rules are also not documented in full. In brief, perl checks if there's a comma right after first thing inside the braces (full disclosure below).
I'd like to make curlies always interpreted as a block:
map
context;do "config.pl"
orsub { { foo => 1 } }
).Several possible solutions:
Require a semicolon after opening brace in ambiguous situations.
Require parens around expressions with comma or fat-comma inside ambiguous blocks.
Require parens around all expressions with comma or fat-comma, if not inside an expression. This is global change, but in return code block syntax becomes the same everywhere (unless I missed anything).
Disambiguation rules
Exact details are not really important to this issue, but I put them here for reference and entertainment.
Perl parses a pair of curly braces as a hash if one of the following is true:
Here token means a quoted string or command (
''
,""
,```,
q{},
qq{}and
qx{}`) or a sequence of word characters.