slackhq / tree-sitter-hack

Hack grammar for tree-sitter
MIT License
33 stars 15 forks source link

Simplify embedded_braced_expression #29

Closed aosq closed 2 years ago

aosq commented 3 years ago

Summary

This PR removes custom call, subscript, and selection expressions used in $.embedded_brace_expression and replaces them with their main alternatives.

We also rename $.embedded_brace_expression to $.embedded_braced_expression to match hh_parse node name. This was my intention originally, but didn't notice the difference in tenses. This also matches the node name $.braced_expression.

Finally, we add more tests for $.embedded_braced_expression.

Issue

@frankeld found an issue with embedded braced expression (https://github.com/slackhq/tree-sitter-hack/pull/26#issuecomment-896040500) in heredocs where nested $.selection_expressions were matched on the tail of the nest instead of the head (like the non-embedded version of $.selection_expression).

Using @frankeld's example, "{$var->fun->yum}"; would approximately result in

(selection_expression (variable) (selection_expression (identifier) (identifier))

instead of what we expect in the non-embedded case

(selection_expression (selection_expression (variable) (identifier)) (identifier))

Change

Duplicate call, subscript, and selection expressions

The child expressions for $.embedded_brace_expression were originally re-written specifically for $.embedded_brace_expression so that only expressions starting with a $.variable were allowed and only expressions with no whitespace between the opening { brace and $ were matched as embedded brace expressions.

For example, func(), var_const["key"], and var_const->func() are valid $.call_expression, $.subscript_expression, and $.selection_expression respectively, but are not allowed inside $.embedded_brace_expression.

Duplicates not necessary

While looking into the ordering issue of $._embedded_brace_selection_expression I realized that because tree-sitter-hack's heredoc scanner specifically looks for the sequence /\{\$[a-zA-Z_\x80-\xff]/ to release control back to the parser, the parser will only match expressions that start with a $.variable and don't have whitespace between { and $. The parser never gets a chance to match expressions that don't start with $.variable.

If there is whitespace between { and $ like say { $var, the scanner matches { as part of $.heredoc_body and the parser doesn't get a chance to match an embedded braced expression.

In the case of an expression that starts with an identifier like {var->func(), the scanner matches {v as part of $.heredoc_body (as well as the rest of the string), so the parser again doesn't get a chance to match an embedded braced expression.

4e554c4c commented 2 years ago

Hi, do you mind rebasing this onto main? if so I'm happy to merge it.

frankeld commented 2 years ago

@4e554c4c I'll go ahead and rebase