Open Nicholas-Lin opened 3 years ago
I noticed that embedded braces do not support scoped identifiers. For example the following test case will fail:
"{$var::get()}";
Not sure if this should be addressed in this PR or we can make a separate PR for it since this one has quite a few changes already.
This is good progress, but there are some extended test cases that seem to break it.
"Test $var->tester- Hello";
errors, but the parser should read $var->tester
as embedded member selection expression.
Also, we have some inconsistency with the way we nest expression items. Consider the following test case/output:
"{$var->fun->yum}";
$var->fun->yum;
(selection_expression [5, 1] - [5, 15]
(variable [5, 1] - [5, 5])
(selection_expression [5, 7] - [5, 15]
(qualified_identifier [5, 7] - [5, 10]
(identifier [5, 7] - [5, 10]))
(qualified_identifier [5, 12] - [5, 15]
(identifier [5, 12] - [5, 15]))))))
(selection_expression [7, 0] - [7, 14]
(selection_expression [7, 0] - [7, 9]
(variable [7, 0] - [7, 4])
(qualified_identifier [7, 6] - [7, 9]
(identifier [7, 6] - [7, 9])))
(qualified_identifier [7, 11] - [7, 14]
(identifier [7, 11] - [7, 14])))))
In the case of the double quoted string, we have the variable in the level between the two selection expressions. This is incorrect, as the selection of the variable isn't against the value of fun->yum
. The non-embedded version gets parsed correctly, as the leading variable identifier is in the deepest level of the nested selection. This inconsistency also happens with Heredoc variable substitution, which may be where we are inheriting it from.
In case it could be helpful, I've implemented string parsing for PHP in the tree-sitter-php repository. Please use whatever is useful to you: https://github.com/tree-sitter/tree-sitter-php/pull/72
Also, we have some inconsistency with the way we nest expression items
Started looking into this and you're right that the inconsistency comes from $.heredoc
. I originally wrote the custom embedded braced expression rules (instead of say reusing $.call_expression
, $.subscript_expression
, $.selection_expression
) because embedded braced expressions are restricted to expressions that start with a $.variable
and to be a valid embedded braced expression there can't be a space between {
and $
.
Reusing existing call/subscript/selection definitions Previously, I thought reusing the existing call/subscript/selection rules would allow invalid scenarios. Thinking on this a little more I realized that's not the case: https://github.com/slackhq/tree-sitter-hack/pull/29. https://github.com/slackhq/tree-sitter-hack/blob/8ac0c52d6b5747b99f512fb9847eb9fb6eaa9946/grammar.js#L154-L164
Replacing $.embedded_brace_expression
with the already defined call/subscript/selection rules fixes the issue you described for heredocs, but I think this only works because heredocs use a scanner. Don't think we could apply the same fix to $.string
without a scanner.
Scanner hack
One way to make the simplified version of $.embedded_brace_expression
work both for heredoc and string without resorting to a scanner for string content, is to create a scanner node just for the {
character of the embedded braced expression. This would allow us to use a simplified $.embedded_brace_expression
but restrict the internal expressions to start with $.variable
like we today for heredocs.
Fixing custom call/subscript/selecting definitions I don't see a way to do this (yet) that doesn't require some narly copy-pasting of existing definitions and modifying them further to restrict them to the embedded braced expression case.
Summary
This PR adds support for embedded expressions and embedded braces in double quoted strings. Note that this PR addresses a similar issue to PR #25. Notably this PR also adds support for embedded expressions and this implementation is entirely done in
grammar.json
(notscanner.cc
).Here are some examples of the constructs that are now supported:
I also added support for escape character sequences so the following examples should parse correctly:
Initially there were some issues with the parser incorrectly interpreting instances of
#
,//
,/*
in the string as a comment, but this should not be a problem anymore!Requirements (place an
x
in each[ ]
)