moodlehq / moodle-cs

Moodle Coding Style
https://github.com/moodlehq/moodle-cs
GNU General Public License v3.0
18 stars 16 forks source link

Add sniff to cover correct use of @var #121

Closed andrewnicols closed 8 months ago

andrewnicols commented 8 months ago

Based on Squiz\Commenting\VariableComment, but:

Note: Based on #119 as they share some new util methods.

codecov[bot] commented 8 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 97.81%. Comparing base (7c207d5) to head (f4c4202).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #121 +/- ## ============================================ + Coverage 97.69% 97.81% +0.11% - Complexity 778 827 +49 ============================================ Files 34 35 +1 Lines 2302 2426 +124 ============================================ + Hits 2249 2373 +124 Misses 53 53 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

stronk7 commented 8 months ago

(a rebase here would be welcome, just in case there is something in the latest #119 version making a difference)

james-cnz commented 8 months ago

Hi @andrewnicols,

I notice there's some overlap between my pull request #123 and this one. It looks like yours does a bunch of checks on variable comments, whereas mine mainly just checks types, but I think my type checking is probably more comprehensive overall (e.g. mine handles DNF types).

How do you feel about me taking out the few checks from mine that aren't specifcally related to type checking (checking there's a comment, that it has a var tag, and not more than one), and you taking out the checks from yours that are specifically related to type checking?

It looks like there are some things I would need to add to mine, though. Mine doesn't handle attributes. It looks like this is a PHP 8 feature. I've only tested against Moodle 4.1, so I guess I'll need to check if there's anything else in PHP 8.x I need to support.

Mine also doesn't check type short forms are used, and in lower case. I think I could probably add this, if it's important. Automatically fixing these could be tricky, though. My checker supports multi-line types. It extracts the type info from multiple T_DOC_COMMENT_STRING tags, and concats them together before doing type checking, so information about which tags the type is from, and where in the tag it is, is lost.

I notice your checker supports arrays in the form array(type1 => type2). This isn't a standard PHPDoc form, AFAIK. It doesn't seem to be supported by either PHPStan or Psalm, which I think are the two most commonly used PHP static checkers, and PHPStan in particular is pretty comprehensive, I think. I'd guess this is a Moodle idiom. I could probably add this to my checker, but I'm not sure it's a good idea to encourage it. I doubt there's much tooling support for it, so I think using it would probably really limit what the PHPDoc annotations are useful for.

andrewnicols commented 8 months ago

Hi James.

I notice there's some overlap between my pull request https://github.com/moodlehq/moodle-cs/pull/123 and this one. It looks like yours does a bunch of checks on variable comments, whereas mine mainly just checks types, but I think my type checking is probably more comprehensive overall (e.g. mine handles DNF types).

This sniff is intended to ensure:

How do you feel about me taking out the few checks from mine that aren't specifcally related to type checking (checking there's a comment, that it has a var tag, and not more than one), and you taking out the checks from yours that are specifically related to type checking?

I'm fine with this in principle but I think it's the wrong approach in this instance.

I think we have two possible approaches here:

  1. Perform the type checking in your new Sniff and remove it from this one (your proposal)
  2. Perform the type checking in the var Sniff, but use the utilities from your patch to do so when that patch lands

The former has the benefit that all type checking is in one place, but the detractors that one place has to be aware of a variety of formatting options for a range of different tag types (@var, @param, @param-*, @return, etc.), and that all type checking is in one place and cannot be applied on a granular level.

The latter has the benefit that each usage check is with similar checks. That is to say that the Variable Sniff will check that a docblock is present for each variable, that it is correctly formatted, that there is only one mention, etc. whilst a ConstSniff will do the same for all const calls, and possibly define calls. The detractor is that in some cases there may be code duplication within the sniff or duplicated effort.

I'm more inclined to go for the latter approach where each variable check is with similar checks, but uses a library to check the value as this will be easier to manage and maintain, and easier to land these changes in a more progressive approach (not just big bang).

I've already tried to build my approach to thing along those lines with the intent of refactoring when the next part of the puzzle needs it. That is from:

protected static function  suggestTypes(string $type): string;

To a utility method:

public static function checkType(File $phpcsFile, int $stackPtr, string $type): string;

In this case what I would propose is that we initially introduce those checks in this issue and, since your change is significantly larger and likely to take longer to review and finalise and therefore likely to land after this one, we consume yours as a library instead of having it be a do-everything Sniff.

Doing it this way allows us to get the check moved over and to remove it from moodlecheck and possibly to perform a release so that these checks are more widely available earlier. It also allows people to filter and test different checks more easily.

It looks like there are some things I would need to add to mine, though. Mine doesn't handle attributes. It looks like this is a PHP 8 feature. I've only tested against Moodle 4.1, so I guess I'll need to check if there's anything else in PHP 8.x I need to support.

Yes - Moodle 4.3 onwards makes use of Attributes. and we are already planning to drop support for PHP 7.4 soon so your changes will need to support attributes.

Mine also doesn't check type short forms are used, and in lower case. I think I could probably add this, if it's important.

I feel these are fairly important as they are part of the coding style. They're also very easy with phpcs.

In this case, the basis for these came from an upstream Sniff from Squiz which was mostly suitable for our needs, but not quite.

Automatically fixing these could be tricky, though.

Ah - we should really aim to have anything which is auto-fixable do so. This is one of the many great benefits of phpcs over moodlechecker.

My checker supports multi-line types. It extracts the type info from multiple T_DOC_COMMENT_STRING tags, and concats them together before doing type checking, so information about which tags the type is from, and where in the tag it is, is lost.

I'm not sure that I follow there. Perhaps you can give an example. The phpcs parser already does much of this work for us and if you find the T_DOC_COMMENT_OPEN_TAG tag, you'll find that phpcs includes in its structure an array of tags named comment_tags whose value is the ptr for each of the @ tags in the docblock. For example:

<?php

/**
 * @covers \some_class
 */
class some_test extends \advanced_testcase {

If I fetch the classblock's open tag here, and get the tokens:

$tokens = $phpcsFile->getTokens();
while ($docPtr = $phpcsFile->findNext(T_DOC_COMMENT_OPEN_TAG, $stackPtr)) {
    $docblock = $tokens[$docPtr];
    foreach ($docblock['comment_tags'] as $tagPtr) {
        // The tag name is here:
        $tagName = $tokens[$tagPtr]['content'];

Unfortunately we still have to process the actual vars, but there should be no need to concat the docs together or anything like that.

I notice your checker supports arrays in the form array(type1 => type2). This isn't a standard PHPDoc form, AFAIK. It doesn't seem to be supported by either PHPStan or Psalm, which I think are the two most commonly used PHP static checkers, and PHPStan in particular is pretty comprehensive, I think. I'd guess this is a Moodle idiom.

No, it's actually something I copied over from the original phpcs check. I'm not a fan and I had already contemplated removing it, but now I definitely will.

Since phpdoc now supports the same syntax as psalm, I'll change it:

// From
/** @var array(int) */
// To
/** @var array<array-key, int> */

// And From:
/** @var array(int => some_object) */
// To
/** @var array<int, some_object> */

This is compliant with both psalm and phpdoc. I've added an auto-fix rule for this too.

james-cnz commented 8 months ago

Hi Andrew,

Regarding doing type checking as part of specific construct checks (variable var, function param & return, not sure what you mean by param-* though) vs considered its own thing, I really think type checking should be considered it's own thing.

From an implementation perspective, I'm gathering namespace and use alias info for the type checking, and doing a preliminary pass over the file to gather class hierarchies. So although either approach would likely involve some duplication of code, I think having type checking spanning sniffs (at least in the case of doing the checking I am) would probably involve more time overhead.

From a usability perspective, I can only really think of two reasons why someone might want to disable type checking. For one, they don't want it at all, either because they're using some in-house style, or they're using types informally. In this case, I can imagine people wanting to enable or disable type checks on a file or directory basis, for different parts of a project, but not on a per-construct basis. I can't imagine anyone wanting, e.g. the standard style for function param and return types, but a custom in-house style for variable var types.

For the other, possibly they do want type checking eventually, but to add it progressively. Again, in this case, I can imagine people wanting to enable or disable type checks on a file or directory basis, for different parts of a project, but not on a per-construct basis. Ideally, code is separated into files because it's a cohesive unit. To figure out what the types should be, I think someone's likely to be focusing primarily on the code in a file at a time, to figure out what parameters it can accept, what it could return, and what it's going to store in and expect to retrieve from variables. I don't think they're likely to want to, e.g., look at a file and figure out the param and return types, then go away and look at param and return types in other files, then come back again later for var types.

If people want type checking in a particular file or directory, I think they're going to want it for all tags, and having the type checking spanning two sniffs would just mean they're going to have to enable or disable two sniffs to do what they want. I can imagine, though, people wanting to do some var checks, but not the type check.

Regarding comment tags, I'm being pedantic, and checking that the tags start on a new line. Technically, I think param, return, and var tags within a line don't count, so I'm not sure the list would help me. But the issue is with multi-line types like this:

    /** @var non-empty-array<\stdClass&object{type: string, namespace: string, uses: string[], templates: string[],
     *                                  classname: ?string, parentname: ?string, opened: bool, closer: ?int}>
     * file scope: classish, function, etc.  We only need a closer if we might be in a switch statement. */
    protected array $scopes;

I think (although don't quite me) the type starts with a T_DOC_COMMENT_STRING, then there's a T_DOC_COMMENT_WHITESPACE for the line break, a T_DOC_COMMENT_WHITESPACE at the start of the line, a T_DOC_COMMENT_STAR, another T_DOC_COMMENT_WHITESPACE after that, then the type continues in another T_DOC_COMMENT_STRING. I concat the T_DOC_COMMENT_STRINGs together with some whitespace between, and pass it to the type parser. I think it would take a bit of work to keep track of where the text came from, and figure out how to put the replacements back in the appropriate tags.

andrewnicols commented 8 months ago

Regarding doing type checking as part of specific construct checks (variable var, function param & return, not sure what you mean by param-* though) vs considered its own thing, I really think type checking should be considered it's own thing.

I'm inclined to disagree because each tag can be used differently and can have different rules applied. For example:

The type checking should not be tightly linked to the specific usage, just the checking on a case-by-case basis.

From an implementation perspective, I'm gathering namespace and use alias info for the type checking, and doing a preliminary pass over the file to gather class hierarchies. So although either approach would likely involve some duplication of code, I think having type checking spanning sniffs (at least in the case of doing the checking I am) would probably involve more time overhead.

I agree, it's a balance we need to find.

From a usability perspective, I can only really think of two reasons why someone might want to disable type checking. For one, they don't want it at all, either because they're using some in-house style, or they're using types informally. In this case, I can imagine people wanting to enable or disable type checks on a file or directory basis, for different parts of a project, but not on a per-construct basis. I can't imagine anyone wanting, e.g. the standard style for function param and return types, but a custom in-house style for variable var types.

The third is that they want to fix things gradually. It's challenging to review large groups of automatic fixes.

Regarding comment tags, I'm being pedantic, and checking that the tags start on a new line. Technically, I think param, return, and var tags within a line don't count, so I'm not sure the list would help me. But the issue is with multi-line types like this:

Sorry, what is this in relation to? I'm not sure of the context of your reply.

    /** @var non-empty-array<\stdClass&object{type: string, namespace: string, uses: string[], templates: string[],
     *                                  classname: ?string, parentname: ?string, opened: bool, closer: ?int}>
     * file scope: classish, function, etc.  We only need a closer if we might be in a switch statement. */
    protected array $scopes;

I think (although don't quite me) the type starts with a T_DOC_COMMENT_STRING, then there's a T_DOC_COMMENT_WHITESPACE for the line break, a T_DOC_COMMENT_WHITESPACE at the start of the line, a T_DOC_COMMENT_STAR, another T_DOC_COMMENT_WHITESPACE after that, then the type continues in another T_DOC_COMMENT_STRING. I concat the T_DOC_COMMENT_STRINGs together with some whitespace between, and pass it to the type parser. I think it would take a bit of work to keep track of where the text came from, and figure out how to put the replacements back in the appropriate tags.

The above has the following AST, noting that the line nos are some other parts will be context-specific.

(
    [0] => Array
        (
            [content] => @var
            [code] => PHPCS_T_DOC_COMMENT_TAG
            [type] => T_DOC_COMMENT_TAG
            [line] => 45
            [column] => 9
            [length] => 4
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [1] => Array
        (
            [content] =>  
            [code] => PHPCS_T_DOC_COMMENT_WHITESPACE
            [type] => T_DOC_COMMENT_WHITESPACE
            [line] => 45
            [column] => 13
            [length] => 1
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [2] => Array
        (
            [content] => non-empty-array<\stdClass&object{type: string, namespace: string, uses: string[], templates: string[],
            [code] => PHPCS_T_DOC_COMMENT_STRING
            [type] => T_DOC_COMMENT_STRING
            [line] => 45
            [column] => 14
            [length] => 102
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [3] => Array
        (
            [content] => 

            [code] => PHPCS_T_DOC_COMMENT_WHITESPACE
            [type] => T_DOC_COMMENT_WHITESPACE
            [line] => 45
            [column] => 116
            [length] => 0
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [4] => Array
        (
            [content] =>      
            [code] => PHPCS_T_DOC_COMMENT_WHITESPACE
            [type] => T_DOC_COMMENT_WHITESPACE
            [line] => 46
            [column] => 1
            [length] => 5
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [5] => Array
        (
            [content] => *
            [code] => PHPCS_T_DOC_COMMENT_STAR
            [type] => T_DOC_COMMENT_STAR
            [line] => 46
            [column] => 6
            [length] => 1
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [6] => Array
        (
            [content] =>                                   
            [code] => PHPCS_T_DOC_COMMENT_WHITESPACE
            [type] => T_DOC_COMMENT_WHITESPACE
            [line] => 46
            [column] => 7
            [length] => 34
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [7] => Array
        (
            [content] => classname: ?string, parentname: ?string, opened: bool, closer: ?int}>
            [code] => PHPCS_T_DOC_COMMENT_STRING
            [type] => T_DOC_COMMENT_STRING
            [line] => 46
            [column] => 41
            [length] => 69
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [8] => Array
        (
            [content] => 

            [code] => PHPCS_T_DOC_COMMENT_WHITESPACE
            [type] => T_DOC_COMMENT_WHITESPACE
            [line] => 46
            [column] => 110
            [length] => 0
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [9] => Array
        (
            [content] =>      
            [code] => PHPCS_T_DOC_COMMENT_WHITESPACE
            [type] => T_DOC_COMMENT_WHITESPACE
            [line] => 47
            [column] => 1
            [length] => 5
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [10] => Array
        (
            [content] => *
            [code] => PHPCS_T_DOC_COMMENT_STAR
            [type] => T_DOC_COMMENT_STAR
            [line] => 47
            [column] => 6
            [length] => 1
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [11] => Array
        (
            [content] =>  
            [code] => PHPCS_T_DOC_COMMENT_WHITESPACE
            [type] => T_DOC_COMMENT_WHITESPACE
            [line] => 47
            [column] => 7
            [length] => 1
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [12] => Array
        (
            [content] => file scope: classish, function, etc.  We only need a closer if we might be in a switch statement. 
            [code] => PHPCS_T_DOC_COMMENT_STRING
            [type] => T_DOC_COMMENT_STRING
            [line] => 47
            [column] => 8
            [length] => 98
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [13] => Array
        (
            [content] => */
            [code] => PHPCS_T_DOC_COMMENT_CLOSE_TAG
            [type] => T_DOC_COMMENT_CLOSE_TAG
            [comment_opener] => 248
            [line] => 47
            [column] => 106
            [length] => 2
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [14] => Array
        (
            [type] => T_WHITESPACE
            [code] => 392
            [content] => 

            [line] => 47
            [column] => 108
            [length] => 0
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

The general approach to this would be:

That said, we want to strongly discourage complex objects unless they are a class in their own right. They may be syntactically valid, but they're not a good pattern.

In fact, the above is not recognised by phpDocumentor in any case. See the attached screenshot.

complexObjectDoc

In any case, I feel that the above should be discussed in your issue, not here.

james-cnz commented 8 months ago

I'm inclined to disagree because each tag can be used differently and can have different rules applied. For example:

From a type checking perspective, these are all basically of the form: tag, type, maybe var, and little else of interest, with a couple of wrinkles. The type checker needn't care whether there's comments before or after, or anything. After I've extracted the contents of the comment, I do the rest in one function parseTypeAndVar with a parameter $getwhat indicating what needs to be fetched. It's 95 lines including blank lines and comments. If it were seperate functions for each variation, I think it would probably be quite a bit longer, and the copy-paste detector would complain.

Regarding param-read & param-write, I guess you mean property, property-read, and property-write. Bother, I'd completely forgotten about those.

The third is that they want to fix things gradually. It's challenging to review large groups of automatic fixes.

As I said, again, in this case, I can imagine people wanting to enable or disable type checks on a file or directory basis, for different parts of a project, but not on a per-construct basis. Ideally, code is separated into files because it's a cohesive unit. To figure out what the types should be, I think someone's likely to be focusing primarily on the code in a file at a time, to figure out what parameters it can accept, what it could return, and what it's going to store in and expect to retrieve from variables. I don't think they're likely to want to, e.g., look at a file and figure out the param and return types, then go away and look at param and return types in other files, then come back again later for var types.

stronk7 commented 8 months ago

This is compliant with both psalm and phpdoc.

Aside from everything else... since when are those generics compliant with PHPDoc? I remember something similar was in the PHPDoc (like 8 years ago), but it's not, anymore, in current version.

Ciao :-)

andrewnicols commented 8 months ago

Aside from everything else... since when are those generics compliant with PHPDoc? I remember something similar was in the PHPDoc (like 8 years ago), but it's not, anymore, in current version.

Damnit... I could have sworn I'd found it in the phpdoc documentation, but now I can't find it.

I did check that it worked though, for example the following content:

    public function __construct(
        /** @var array<string, string> Examples of how it is done */
        public readonly array $examples,

        /** @var array<string, int> */
        public readonly array $names,

        /** @var array<int, Example> */
        public readonly array $foods,
    ) {}

Generates the following docs:

Screenshot 2024-03-19 at 21 34 00

It looks like it happened in this commit where they switched to phpstan/phpdoc-parser for type parsing, which brought with it the same support for generics as phpstan.

andrewnicols commented 8 months ago

Okay, I've removed the generics support and I think I've addressed all other comments.

andrewnicols commented 8 months ago

Rebased.

stronk7 commented 8 months ago

I like current approach in this PR, avoiding to make any type / matching check and, instead, just verify that the the @var tags are well-formed, and provide the fixing of some long => short types.

I think that we can implement the type checks apart, maybe looking to @james-cnz stuff (I've not yet), let's see. But better keep this Sniff away from any type check between the phpdoc and the php code.

The only two points that, maybe, could be missing in this PR, that I can imagine are:

  1. Maybe it would make sense to also have null in the types, so they can be lowercased and things like that? It's valid phpdoc type, so far.
  2. The other case that I think that is not covered and it looks like it could be easy to implement, are the fixes when the type is a phpdoc union, I mean, things like boolean|INTEGER => boo|int and similar ones.
  3. This is a question, does phpdoc allow intersection (&) nomenclature too? If so, the above should apply also to them.

In any case, as said, I'd propose to create another issue for those cases, they are not critical for this PR that perfectly implements what we are trying to replace (from moodle-local_moodlecheck) and more.

So, all right, IMO.

Ciao :-)

stronk7 commented 8 months ago

Answering to myself about point 3. above, it seems that phpdoc (the current proposal) does support intersection types:

https://github.com/php-fig/fig-standards/blob/master/proposed/phpdoc.md#details

So, all the comments above about union types do also apply to intersection ones (maybe also to (A|B)&C combos, I'd infer.

Re-ciao :-)