microsoft / tolerant-php-parser

An early-stage PHP parser designed for IDE usage scenarios.
MIT License
883 stars 80 forks source link

Parser does not detect invalid cast expressions #33

Open mattacosta opened 7 years ago

mattacosta commented 7 years ago

Cast expressions may not contain anything except spaces and tabs between the parentheses.

In the following example, the parser generates a CastExpression while PHP sees it as a constant named 'int' followed by a variable.

Example: <?php (/* hello */int)$a;

Actual result: No error.

Expected result: syntax error, unexpected $a (T_VARIABLE)

mousetraps commented 7 years ago

Aha! Good catch. Now I understand why the built-in tokenizer produces ArrayCast/BoolCast/etc. Tokens.

We should be able to just remove the call to parseCastExpressionGranular (which I had initially kept around to remain backwards compatible with the old lexer, and potentially to provide some nicer error handling in the future, but now we can just remove it.) https://github.com/Microsoft/tolerant-php-parser/blob/master/src/Parser.php#L1459

Also, this would make a good first PR

CyrusNajmabadi commented 7 years ago

Another option is to keep all these tokens, and parse like you're doing, but give a message stating that there can't be anything between the ( type and ).

Or, your parser can check the tokens as it is parsing, and only accept this production if there is no trivia between them.

--

Note C# has this for when it sees >>. Those are two separate tokens (so they can be used in List<List<int>>). But in an expression context it gets accepted as 'right shift' when there is no trivia between the tokens. i.e. we would not accept a >/**/> b.

mattacosta commented 7 years ago
<?php
define('int', 'i am a string');
echo (/**/int);  // prints "i am a string"

There can't be a message because the snippet above is valid code. So if the code were modified to add:

$a = ' really';
echo (/**/int)$a;

...then the error should be an unexpected variable, not an invalid cast error.

Since the parser uses PHP's own tokenizer, the second case would already return a cast token as well, so it's forced into the correct production.