squizlabs / PHP_CodeSniffer

PHP_CodeSniffer tokenizes PHP files and detects violations of a defined set of coding standards.
BSD 3-Clause "New" or "Revised" License
10.66k stars 1.48k forks source link

PHP 8.1: Resolve `readonly` earlier to fix some bugs #3584

Closed kukulich closed 2 years ago

kukulich commented 2 years ago

Fixes some bugs for code like this:

class Foo {
   readonly string|null $nullableString;
}

readonly has to be resolved earlier to resolve T_TYPE_UNION right.

It's based on https://github.com/squizlabs/PHP_CodeSniffer/pull/3515 to minimize conflicts.

jrfnl commented 2 years ago

Heads-up: I just applied a minor fix to #3515 as for some silly reason it turned out it had gotten \r\n line endings.

jrfnl commented 2 years ago

Just confirmed: the test in NamedFunctionCallArgumentsTest is actually needed, including a fix.

/* testReservedKeywordReadonly1 */
foobar(readonly: $value, /* testReservedKeywordReadonly2 */ readonly: $value);

The above would currently incorrectly tokenize to T_READONLY instead of T_PARAM_NAME (based on this patch). I suspect moving the readonly backfill to after the param name check should suffice.

kukulich commented 2 years ago

@jrfnl

Heads-up: I just applied a minor fix to https://github.com/squizlabs/PHP_CodeSniffer/pull/3515 as for some silly reason it turned out it had gotten \r\n line endings.

Rebased.

If I look at the new implementation for the backfill, however, I see redundancy between the last two conditions of the wrapper if and the if/else inside.

Thanks, I forgot to remove some lines after refactoring.

Also note: readonly can only ever be the readonly keyword when not followed by an open parenthesis. This includes function calls. I'd suggest adding the following extra test which should highlight the problem:

Test added but I think the token should be tokenized as T_READONLY: https://3v4l.org/XO1KN

Regarding the BitwiseOrTest, just to be on the safe side:

All three tests added.

The above would currently incorrectly tokenize to T_READONLY instead of T_PARAM_NAME (based on this patch). I suspect moving the readonly backfill to after the param name check should suffice.

Test added and fixed.

jrfnl commented 2 years ago

Thanks for the updates! Looks good, though I haven't rerun my tests (yet).

Also note: readonly can only ever be the readonly keyword when not followed by an open parenthesis. This includes function calls. I'd suggest adding the following extra test which should highlight the problem:

Test added but I think the token should be tokenized as T_READONLY: https://3v4l.org/XO1KN

Could you explain why you think it should be tokenized as T_READONLY ? I'm interested to hear perspective.

As for my argumentation: even as a parse error, it is definitely not the readonly keyword in the right context. We re-tokenize class names, function declaration names etc to T_STRING, even when they are a reserved keyword, so why would this case be any different ?

kukulich commented 2 years ago

Could you explain why you think it should be tokenized as T_READONLY ? I'm interested to hear perspective.

We would have to change the token also for PHP 8.1+: https://3v4l.org/3AfHQ I think it's useless work when the code is invalid anyway.

jrfnl commented 2 years ago

Could you explain why you think it should be tokenized as T_READONLY ? I'm interested to hear perspective.

We would have to change the token also for PHP 8.1+: https://3v4l.org/3AfHQ I think it's useless work when the code is invalid anyway.

Except we probably should anyway: see https://3v4l.org/rFcHe (without the comment) and https://3v4l.org/Wb3mA

kukulich commented 2 years ago

Except we probably should anyway: see https://3v4l.org/rFcHe (without the comment) and https://3v4l.org/Wb3mA

I don’t understand. I think this case is already solved and tested.

gsherwood commented 2 years ago

Just catching up on the discussion around T_READONLY vs T_STRING in various cases. As best I can tell, this is the current state:

With this change the following code tokenizes the readonly strings as T_STRING:

function readonly() {}
$var = readonly();

and the following code tokenizes the readonly strings as T_READONLY:

function readonly /*comment*/ () {}
$var = readonly /*comment*/ ();

This is also how PHP 8.1+ tokenizes these strings because the second set of cases are both parse errors, so we can say that the backport is consistent.

Time could be spent to tokenize the second set of readonly strings as T_STRING on all PHP versions, but this isn't something I'd consider a blocker to getting this merged given we're only looking at parse errors.

Please let me know if I'm missing something, otherwise I'll merge this in.

kukulich commented 2 years ago

This is also how PHP 8.1+ tokenizes these strings because the second set of cases are both parse errors, so we can say that the backport is consistent.

Yes, my intention was to make the backport compatible with PHP 8.1+.

Time could be spent to tokenize the second set of readonly strings as T_STRING on all PHP versions

I think it's not neccessary so we don't need to spend our time on it. I would also need more code (to modify the behaviour on PHP 8.1+).

gsherwood commented 2 years ago

I've merged this in now - thanks a lot for the work on this. Any changes to the token structure across all PHP versions can be handled elsewhere, but I thought it was important to get this fix in.