squizlabs / PHP_CodeSniffer

PHP_CodeSniffer tokenizes PHP files and detects violations of a defined set of coding standards.
BSD 3-Clause "New" or "Revised" License
10.66k stars 1.48k forks source link

PHP 7 return type tokenize #1527

Closed michalbundyra closed 6 years ago

michalbundyra commented 7 years ago

In PHP 7 we can declare return type for method:

function m() : int {
}

In this case "int" is tokenized as T_RETURN_TYPE and it's fine.

We can also declare there complex type:

function n() : \MyNamespace\MyClass {
}

and in the second case I think it is wrong tokenized, because we have there:

so we have 4 tokens, but I would expect only one: T_RETURN_TYPE with content \MyNamespace\MyClass. What do you think?

gsherwood commented 7 years ago

If I tokenized it as T_RETURN_TYPE, sniffs testing namespaces wouldn't be able to find it.

If I gave it a new special type of token, sniffs could listen for it as well, but would have to separate out the namespace themselves when finding it. This would also be a BC break.

The main reason T_RETURN_TYPE was added in the first place was to ensure sniffs listening for T_ARRAY wouldn't pick up the token. I had to give it a new token type, so I picked that one.

If a sniff wants to check the placement of the return type, or the fact it exists, they should listen for T_FUNCTION and look for the T_COLON before the return type.

michalbundyra commented 7 years ago

@gsherwood yesterday I've played with it a bite more and finally I think it's fine (at least for and my cases). I've developed following sniff: https://github.com/zendframework/zend-coding-standard/blob/feature/sniffs/src/ZendCodingStandard/Sniffs/Formatting/ReturnTypeSniff.php (checks return type formatting). Probably we can do it better with some configuration options (like how many spaces before/after colon and question mark (nullable operator)). This sniff also checks for lowercase simple types (array, int, string, ...). Maybe for it we should change LowerCaseKeyword sniff to check return types.

If you'd like I can create PR to PHP_CodeSniffer if you want include this sniff into this library. What do you think? Thanks!

donatj commented 6 years ago

Spent about an hour trying to figure out what was wrong with a test. Was expecting T_RETURN_TYPE to be the full type string.

A decent solution I think would be if getMethodProperties gave me the full return type as a string.

gsherwood commented 6 years ago

A decent solution I think would be if getMethodProperties gave me the full return type as a string.

Just want to be clear that this is the feature I'll be looking to implement in 3.3.0 and not changing how the return types are tokenized.

gsherwood commented 6 years ago

Just want to be clear that this is the feature I'll be looking to implement in 3.3.0 and not changing how the return types are tokenized.

I've changed my mind. Given that this is valid code:

class Foo {
    function myFunction(): \ MyNamespace \ 
       /*comment*/MyClass \
       Bar {}
}

I think developers are going to need the extra help from the tokenizer. I'll still include the return type in getMethodProperties() but I'll include the cleaned up version in there (minus whitespace and comments) so it saves having to do that in sniffs.

gsherwood commented 6 years ago

I've now committed the change that will tokenize the entire namespaced return type as a single T_RETURN_TYPE token and the changed to File::getMethodProperties() to include the cleaned up return type in the array it returns.

If the return type is nullable, the ? will remain a T_NULLABLE token but the return_type index from File::getMethodProperties() will include the leading ?. Another new return value array index nullable_return_type can be used to see if the return type was nullable.

As this change manipulates the token array at a very low level, I had to move the code that detects return types from the second pass to the first pass in the tokenizer, which changes the way detection happens. All tests are passing (including new ones) but it could use more testing from anyone who is able to.

michalbundyra commented 6 years ago

@gsherwood

tokenize the entire namespaced return type as a single T_RETURN_TYPE

I've checked it on the following example:

function hello() : \MyNamespace // hey
        \Something # hello
        \Here /** hola */ \MyInterface {}

and I'm getting:

10: T_RETURN_TYPE => \MyNamespace // hey\n
11: T_RETURN TYPE =>         \Something # hello\n
12: T_RETURN_TYPE =>         \Here /** hola */ \MyInterface

so as you can see it's not a single T_RETURN_TYPE for entire namespaced return type.

gsherwood commented 6 years ago

so as you can see it's not a single T_RETURN_TYPE for entire namespaced return type.

That's correct. PHPCS splits all multi-line tokens like that. The File::getMethodProperties() method takes that into account and combines all those tokens together.

If you want to ensure developers are not splitting namespaced return values over multiple lines, you can just check to ensure there is a single T_RETURN_TYPE token. If you don't care, or prefer that, you can use File::getMethodProperties() to combine them all together if you don't want to do it yourself.

jrfnl commented 6 years ago

I'm a bit apprehensive about this change as - based on the finding from @webimpress - you will no longer be able to detect (and forbid) comments being used within return type declarations as the comments are now tokenized as part of the return type which feels very counter-intuïtive to me.

If I'm honest, I would have expected just the change to the getMethodProperties() method to fix this issue and the tokenizer to have been left alone. That way, the individual tokens of class based return declarations can still be examined if needs be, but if you just need to examine the return type as a string, you'd use the getMethodProperties() method to get access to it.

jrfnl commented 6 years ago

In addition to the above: Note: I haven't had the chance to run any tests yet, but - again - based on what @webimpress wrote - I'm a bit worried about the following with this tokenizer change:

Also, - again based on what I've read here -, I suspect that:

michalbundyra commented 6 years ago

I'm agree with @jrfnl. Just tried to adjust my sniff to work with current behaviour and it's so complicated, because we have comments in T_RETURN_TYPE and multiple T_RETURN_TYPE for one function. The second is fine, we can have multiple T_RETURN_TYPE for one function, but I don't like including comments into the content. I think we should either:

Changes to getMethodProperties are fine, but tokenizing not, imho.

gsherwood commented 6 years ago

Changes to getMethodProperties are fine, but tokenizing not, imho.

I thought this is exactly what you asked for.

So what about T_WHITESPACE, T_COMMENT etc all remain as those tokens, but all other parts of the return type become T_RETURN_TYPE.

So if you do what I imagine the vast vast majority of developers will do (no whitepsace, no comments) then you end up with one T_RETURN_TYPE token. If you go crazy and fill the return type with rubbish, you get a mix of whitespace/comments/return type tokens. The File:: getMethodProperties() method keeps returning the cleaned up version.

So the major change is that T_STRING and T_NS_SEPARATOR tokens will now be combined into T_RETURN_TYPE tokens instead of being left as they are.

Thoughts?

michalbundyra commented 6 years ago

@gsherwood When I created that issue I haven't considered comments and whitespaces in return type. I thought it is not allowed and PHP parser will raise an exception. So I would have there separate tokens, but type - you have decide what you think will be better. I can see some advantages of having just ONE T_RETURN_TYPE for a function and other parts will be T_STRING, T_NS_SEPARATOR or one of Tokens::$emptyTokens (so as it was) or just changing T_STRING to T_RETURN_TYPE. I'm not sure what will be the best solution.

As I said in https://github.com/squizlabs/PHP_CodeSniffer/issues/1527#issuecomment-310639697 I've managed to write a sniff working with T_STRING, T_NS_SEPARATOR, T_RETURN_TYPE and Tokens::$emptyTokens and it's fine. I was not using getMethodProperties to check return type, but your changes there seems reasonable.

michalbundyra commented 6 years ago

@gsherwood I've created PR #1994 with reverted changes to the tokenizer, all tests are fine. Function getMethodProperties return cleared return type, but now we don't have there any tricky regular expression to clear it, just we append valid tokens (T_NS_SEPARATOR, T_STRING, T_RETURN_TYPE).

gsherwood commented 6 years ago

I'm not going to revert this change.

I'll probably just change it to how I described things working in my last comment because I don't like the idea of the last bit of a namespace being tokenised as T_RETURN_TYPE and the rest of it as T_STRING. I also don't like the idea of a class name without a namespace being tokenized as T_RETURN_TYPE, making it different to a namespaced one. The testing I did made me realise it is not consistent, so it needs to change.

gsherwood commented 6 years ago

I've made the change I was talking about. So in that sample code:

<?php

function hello() : \MyNamespace // hey
        \Something # hello
        \Here /** hola */ \MyInterface {}

Instead of the return type being tokenised like this:

10: T_RETURN_TYPE => \MyNamespace // hey\n
11: T_RETURN TYPE =>         \Something # hello\n
12: T_RETURN_TYPE =>         \Here /** hola */ \MyInterface

It is now tokenized like this:

10: T_RETURN_TYPE => \MyNamespace
11: T_WHITESPACE => ·
12: T_COMMENT => //·hey\n
13: T_WHITESPACE => ········
14: T_RETURN_TYPE => \Something
15: T_WHITESPACE => ·
16: T_COMMENT => #·hello\n
17: T_WHITESPACE => ········
18: T_RETURN_TYPE => \Here

Which is still a mess, but a mess because someone decided to put comments and whitespace inside a namespaced return type. I don't even know why that is valid code, but I think it's pretty easy to detect now given you can just look for T_RETURN_TYPE, and if you find another T_RETURN_TYPE before the end of the definition, it's a dirty value.

If you want comments inside your return type strings, you can do that now and they will be checked properly. If you want newlines inside your return type string, you can do that now and the newline char will be checked. If you want namespaced return types to be split up and indented, you can write a sniff to check that now.

Anything thoughts from anyone on this change?

gsherwood commented 6 years ago

Forgot to mention that the (hopefully) more normal way of writing the return type:

<?php

function hello() : \MyNamespace\Something\Here\MyInterface {}

Will now get tokenized like this:

10: T_RETURN_TYPE => \MyNamespace\Something\Here\MyInterface

Which was the original request, and replaces the existing behaviour that tokenized it like this:

10: T_NS_SEPARATOR => \
11: T_STRING => MyNamespace
12: T_NS_SEPARATOR => \
13: T_STRING => Something
14: T_NS_SEPARATOR => \
15: T_STRING => Here
16: T_NS_SEPARATOR => \
17: T_RETURN_TYPE => MyInterface
jrfnl commented 6 years ago

I'm definitely a lot happier with the new version. Thank you @gsherwood.

AFAICS, there still one of my previously raised concerns remaining unaddressed:

  • A standard will no longer easily be able to forbid whitespace around the namespace separator in a return type without using complicated regexes.

This is about code like:

function functionName() : \ NSName \ NSSubName \ ClassName {}
gsherwood commented 6 years ago

A standard will no longer easily be able to forbid whitespace around the namespace separator in a return type without using complicated regexes.

This is true, and was my original concern with doing this. But I don't think it is overly hard to check now that all whitespace and comments are tokenized individually. So that same code will be tokenized like this:

10: T_RETURN_TYPE => \
11: T_WHITESPACE => ·
12: T_RETURN_TYPE => NSName
13: T_WHITESPACE => ·
14: T_RETURN_TYPE => \
15: T_WHITESPACE => ·
16: T_RETURN_TYPE => NSSubName
17: T_WHITESPACE => ·
18: T_RETURN_TYPE => \
19: T_WHITESPACE => ·
20: T_RETURN_TYPE => ClassName

So it's a matter of also looking for T_RETURN_TYPE and seeing if it contains a \. If it does, it is either a namespaced class name or part of one. You can then look either side for whitespace if you want. Sniffs like this may have already needed to know about T_RETURN_TYPE due to the last part of the namespace being tokenized like that instead of T_STRING when used as a return type.

It's the same sort of thing when using namespace separators inside strings:

$method = '\ NSName \ NSSubName \ ClassName :: foo';
call_user_func($method);

Although that would be a far harder sniff to write.

jrfnl commented 6 years ago

@gsherwood Thanks for your response. I guess that's doable, though it still requires to sniff for an additional - non-intuitive - token.

michalbundyra commented 6 years ago

@gsherwood Now it works for me. Thanks!

However I'm not sure if my original request was valid. I've changes my sniffs to work with multiple T_RETURN_TYPE tokens and I can do everything with it, it is just not obvious that we have more tokens included in one token (I mean T_NS_SEPARATOR and T_STRING). Not sure if we should have T_RETURN_TYPE tag at all. Maybe sniffs checking return type should operate on "function" and return when there is no return type declared? For me it works now, and if others are happy with it I would close that issue.

Maybe it should be revisited in v4. Thoughts?

gsherwood commented 6 years ago

I can remove both T_ARRAY_HINT and T_RETURN_TYPE, which were both there to stop "array" strings from being tokenized as T_ARRAY. That also means that T_SELF/T_PARENT etc used for return types will revert back to those tokens from the T_RETURN_TYPE tokens that they currently are.

gsherwood commented 6 years ago

I've removed T_ARRAY_HINT because it didn't make sense for it to have its own token but all other type hints remained as normal tokens.

So with that done, it probably doesn't make sense to have special tokens for T_RETURN_TYPE either. The one thing that needs doing in there is to ensure "array" is converted to T_STRING so it is not found by array sniffs.

gsherwood commented 6 years ago

I've removed T_RETURN_TYPE completely and ensured that "array" strings in there are now T_STRING. You'll need to figure out the return type start position and string manually from now on, or use getMethodProperties if you just want the cleaned up version of the return type.

I've kept all the conversion code in the first pass of the tokenizer because I have a suspicion that I may need to do more mucking around with this area in the future.

Anyone have have thoughts about this change? I don't think I'll change my mind again, but open to attempts.

jrfnl commented 6 years ago

@gsherwood These changes make total sense to me.

While it will be a pain to adjust existing sniffs to deal with PHPCS cross-version compat for this (for external standards which support more than just the latest & greatest PHPCS), I think this makes the whole function declaration tokenizing more consistent and stable for the future.

A test run on WPCS shows no breaks so far, which is not surprising as WP still supports PHP 5.2, so not many - if any - sniffs take type declarations into account.

However, as could be expected, a test run on PHPCompatibility shows quite some breakage, at least five sniffs will need adjusting. I'll try and find some time this weekend to dig into the needed adjustments and will let you know how I get on.

gsherwood commented 6 years ago

I'll try and find some time this weekend to dig into the needed adjustments and will let you know how I get on.

Thanks a lot.

michalbundyra commented 6 years ago

@gsherwood I had a look on my sniffs and all seems to be fine, I need to do couple small changes, but in general it works, and I'm happy with that solution.

So just to note it here function/closure return type declaration can contain: T_STRING, T_NS_SEPARATOR, T_CALLABLE, T_SELF, T_PARENT, and if there is a mess any of Tokens::$emptyToken.

Thanks a lot for all of these changes, @gsherwood !

gsherwood commented 6 years ago

@webimpress Thanks. We got there in the end.

I'll leave this open until @jrfnl has had a chance to check the impact on PHPCompatibility and see if any more changes are suggested.

jrfnl commented 6 years ago

FYI: I've been down & out with the flu for the last three days (and still not well). Once my brain is working again, this is on my list.

gsherwood commented 6 years ago

I've been down & out with the flu for the last three days (and still not well)

:( Hope you feel better soon.

jrfnl commented 6 years ago

In the mean time, I've had a look at this for PHPCompatibility & got things working again for that standard - PR is open: https://github.com/wimg/PHPCompatibility/pull/642

I haven't verified WPCS yet, but expect no problems.

gsherwood commented 6 years ago

Thanks @jrfnl. I'll close this off now.