z7zmey / php-parser

PHP parser written in Go
https://php-parser.com
MIT License
941 stars 63 forks source link

No reliable way to work with parenthesis #97

Closed quasilyte closed 5 years ago

quasilyte commented 5 years ago

Proposal: add WithParens in analogy with func (l *Parser) WithFreeFloating(). That function asks parser to preserve info about parenthesis.

It can either introduce AST nodes that represent parenthesis or make them implicit, but positions info should respect them.

It's almost impossible to handle parenthesis in the source code right now. We can't reliably count a number of expression surrounding ( and ).

Imagine this binary expression:

(($x) + ((($y))))

If we use GetPosition and take a slice of src[pos.StartPos-1 : pos.EndPos], result would be:

$x) + ((($y

If we go to the left and right, we'll count 2 parens, but the binary expression itself has only 1 surrounding, while $y is surrounded by 3 pairs of ().

With proposed feature, I would expect binary expression position to capture the entire (($x) + ((($y)))). Sub-expression 1 should be ($x) and the 2 is ((($y))).

For clone expr like clone ($x) we get positions that encloses clone ($x.

quasilyte commented 5 years ago

Note for #81: you can encode parenthesis as a int field that represents how many ( ) around expr there are. It can even be a few bits from a flags field, I would say encoding up to 3 pairs (2 bits) is enough (0 - no, 1 - (), 2 - (()), 3 - ((()))).

quasilyte commented 5 years ago

One possible fix is to update '(' expr ')' definitions to set positions like this:

'(' expr ')'
  $$ = $2;
  $$.SetPosition(yylex.(*Parser).positionBuilder.NewTokensPosition($1, $3))

So we don't lose parentheses positions info.

diff --git a/php7/php7.y b/php7/php7.y
index 3ff11c0..89ef7b4 100644
--- a/php7/php7.y
+++ b/php7/php7.y
@@ -3856,6 +3856,9 @@ expr_without_variable:
             {
                 $$ = $2;

+                // save position
+                $$.SetPosition(yylex.(*Parser).positionBuilder.NewTokensPosition($1, $3))
+
                 // save comments
                 yylex.(*Parser).setFreeFloating($$, freefloating.Start, append($1.FreeFloating, append(yylex.(*Parser).GetFreeFloatingToken($1), (*$$.GetFreeFloating())[freefloating.Start]...)...))
                 yylex.(*Parser).setFreeFloating($$, freefloating.End, append((*$$.GetFreeFloating())[freefloating.End], append($3.FreeFloating, yylex.(*Parser).GetFreeFloatingToken($3)...)...))
@@ -4764,6 +4767,9 @@ dereferencable:
             {
                 $$ = $2;

+                // save position
+                $$.SetPosition(yylex.(*Parser).positionBuilder.NewTokensPosition($1, $3))
+
                 // save comments
                 yylex.(*Parser).setFreeFloating($$, freefloating.Start, append($1.FreeFloating, append(yylex.(*Parser).GetFreeFloatingToken($1), (*$$.GetFreeFloating())[freefloating.Start]...)...))
                 yylex.(*Parser).setFreeFloating($$, freefloating.End, append((*$$.GetFreeFloating())[freefloating.End], append($3.FreeFloating, yylex.(*Parser).GetFreeFloatingToken($3)...)...))
@@ -4789,6 +4795,9 @@ callable_expr:
             {
                 $$ = $2;

+                // save position
+                $$.SetPosition(yylex.(*Parser).positionBuilder.NewTokensPosition($1, $3))
+
                 // save comments
                 yylex.(*Parser).setFreeFloating($$, freefloating.Start, append($1.FreeFloating, append(yylex.(*Parser).GetFreeFloatingToken($1), (*$$.GetFreeFloating())[freefloating.Start]...)...))
                 yylex.(*Parser).setFreeFloating($$, freefloating.End, append((*$$.GetFreeFloating())[freefloating.End], append($3.FreeFloating, yylex.(*Parser).GetFreeFloatingToken($3)...)...))

But I think having explicit paren expr nodes is better even though it would hurt the performance.

quasilyte commented 5 years ago

Now I see that "free floating" might be the thing I'm looking for. :)

z7zmey commented 5 years ago

Yes, you can count ( free-floating tokens at freefloating.Start position. But be careful, I move all freefloating.Start whitespaces, comments, and tokens to the most parent node, but freefloating.End saves as is so ( and ) tokens can be coupled to different nodes.