phpstan / phpdoc-parser

Next-gen phpDoc parser with support for intersection types and generics
MIT License
1.31k stars 62 forks source link

Newlines stops consuming tag descriptions #241

Closed jaapio closed 6 days ago

jaapio commented 4 months ago

I got an issue report on my project https://github.com/phpDocumentor/ReflectionDocBlock/issues/365 where we found an issue in this parser. Summarized the users are facing an issue with tag descriptions containing empty lines.

/**
 * @param string $myParam this is a description of the param
 *
 *                     And this line also belongs to $myParam.
 * @param array $other this is a new param.
 */

Newlines in tag descriptions are possible, but when the newline is detected the parser stops consuming new tokens. While if should only stop when a new tag starts. Other reported issues are more or less out of scope of this issue and I can workaround them. But the situation with the newlines cannot be patched from the outside. To prove the solution I made an ugly hack in our code to see if we can overcome this issue by just adjusting a single method.

https://github.com/phpDocumentor/ReflectionDocBlock/pull/369

The POC shows that the method getSkippedHorizontalWhiteSpaceIfAny is not working when the detected token is processed as a newline token. As a result the leading whitespace between param and And in my example is removed by the parser.

I would be able to create a PR to solve this issue. But before I invest more time I would like to see if other solutions are already available.

ondrejmirtes commented 4 months ago

Have you tried setting textBetweenTagsBelongsToDescription in PhpDocParser to true?

These options are similar to PHPStan's bleeding edge. Right now they're set to false for BC reasons but they will be removed in 2.0 and code will behave the new modern way (as if these options were set to true).

Also check out options in TypeParser and ConstExprParser.

jaapio commented 4 months ago

Yes I did check those settings, those resulted in the same behavior. I didn't check the other parsers but I assume they have nothing Todo with the issue we are facing as it is reproduced with just the description. In fact the text processing stops after an empty line in all cases I checked.

jaapio commented 3 months ago

I checked again... since I was using the parser just for a single tag this caused issues. So I do have a workaround that works for us right now. As we can simply consume all tokens from the lexer.

Which brings me to another issue, which might be one you also are facing when reconstructing docblocks. When I have a docblock line like this:

/**
 * @param string $myParam this is a description of the param
 *
 *                     And this line also belongs to $myParam.
 * @param array $other this is a new param.
 */
*/

The token on line 2 contains: \n with type Lexer::TOKEN_PHPDOC_EOL but as the parser sees this as a EOL it will remove the trailing whitespace when building the description.

We fixed this by adding some post processing on the tokens:

    private function tokenizeLine(string $tagLine): TokenIterator
    {
        $tokens = $this->lexer->tokenize($tagLine);
        $fixed = [];
        foreach ($tokens as $token) {
            if ($token[1] === Lexer::TOKEN_PHPDOC_EOL) {
                if (rtrim($token[0], " \t") !== $token[0]) {
                    $fixed[] = [rtrim($token[Lexer::VALUE_OFFSET], " \t"), Lexer::TOKEN_PHPDOC_EOL, $token[2] ?? null];
                    $fixed[] = [ltrim($token[Lexer::VALUE_OFFSET], "\n\r"), Lexer::TOKEN_HORIZONTAL_WS, ($token[2] ?? null) + 1];
                    continue;
                }
            }

            $fixed[] = $token;
        }
        return new TokenIterator($fixed);;
    }

But I think this is a bug in the way phpstan parses the docblocks. This expression: self::TOKEN_PHPDOC_EOL => '\\r?+\\n[\\x09\\x20]*+(?:\\*(?!/)\\x20?+)?', seems to be to greedy.

Do you think it's worth fixing this?

ondrejmirtes commented 6 days ago

I think this works as expected. We even have tests for:

 * @param string $myParam this is a description of the param
 *
 *                     And this line also belongs to $myParam.
 * @param array $other this is a new param.
 *

And everything up up until @param array $other is consumed as description of the first @param.