phpstan / phpdoc-parser

Next-gen phpDoc parser with support for intersection types and generics
MIT License
1.31k stars 62 forks source link

Support for escaped quotes in Doctrine Annotation strings #205

Closed colinodell closed 1 year ago

colinodell commented 1 year ago

This library does not seem to properly support escaped quotes within Doctrine Annotation string values. It seems this might be a known issue, as I found this incomplete and commented-out test:

https://github.com/phpstan/phpdoc-parser/blob/4a1ab8e11e9957f9cc9f89f87a7c912489f08119/tests/PHPStan/Parser/PhpDocParserTest.php#L6240-L6244

Running this example should produce:

new PhpDocNode([
    new PhpDocTagNode(
        '@Doctrine\Tests\Common\Annotations\Name',
            new DoctrineTagValueNode(
                new DoctrineAnnotation('@Doctrine\Tests\Common\Annotations\Name', [
                    new DoctrineArgument(new IdentifierTypeNode('foo'), new ConstExprStringNode('"bar"')),
                ]),
            ''
        )
    ),
])

However, instead of a DoctrineTagValueNode we actually get an InvalidTagValueNode. It seems this happens because this library doesn't know how to handle the "" syntax that Doctrine Annotations uses to escape quote characters in strings.

This issue is preventing downstream projects like https://github.com/slevomat/coding-standard from accurately understanding those annotations. For example, given an annotation like this:

    /**
     * @Assert\Choice(
     *   choices=Foo::ALLOWED_CHOICES,
     *   message="Allowed choices are ""bar"" or ""baz""."
     * )
     */
    private ?string $foo = null;

phpdoc-parser fails to parse this annotation, and so the coding standard doesn't realize that Foo is a class usage. (This results in false positives in the UnusedUsesSniff.)

The only workarounds I'm aware of are:

  1. Remove all escaped double-quote characters from my annotations to avoid this bug
  2. Keep phpstan/phpdoc-parser and slevomat/coding-standard locked to older versions which don't have this issue

However, if this bug can be fixed here, slevomat/coding-standard should once again be able to see those references, thus eliminating the false positives.

ondrejmirtes commented 1 year ago

The way Doctrine annotations escape strings is weird. What phpdoc-parser does for its own purposes is that it escapes/unescapes strings the same way PHP does.

I'm not against supporting this here, I was just waiting if someone actually uses escaped strings in Doctrine annotations in practice, because the implementation might get a bit complicated.

In order to implement this what I'd really like is if someone extracted the algorithm for encoding and decoding such escaped strings. That would help me a lot.

colinodell commented 1 year ago

The way Doctrine annotations escape strings is weird.

Indeed! It's the only library I'm aware of that works this way 🙃

In order to implement this what I'd really like is if someone extracted the algorithm for encoding and decoding such escaped strings. That would help me a lot.

This is the regular expression their lexer seems to use to match those strings: "(?:""|[^"])*+". It then uses this simple algorithm where inner sequences of "" are replaced with ".

ondrejmirtes commented 1 year ago

One more thing please - given a string value that might include ", how to escape it so that it can be printed as part of Doctrine annotation?

colinodell commented 1 year ago

I think you simply replace any inner " characters with "", and then add the usual starting and ending " around the printed string contents. For example:

String variable inner contents Encoded & Printed
foo "foo"
Allowed choices are "bar" or "baz". "Allowed choices are ""bar"" or ""baz""."
In PHP, "" is an empty string "In PHP, """" is an empty string"
"May the Force be with you," he said. """May the Force be with you,"" he said."
ondrejmirtes commented 1 year ago

Implemented: https://github.com/phpstan/phpdoc-parser/commit/57457750d3f5c20834c1695e776bc21c909af10f

I'm not gonna release it right away - I will come up with more tricky inputs that will break my tokenizer/parser.

colinodell commented 1 year ago

Sounds good to me. Thank you so much for implementing this!

ondrejmirtes commented 1 year ago

I couldn't come up with something that would break the logic so I released it as 1.23.1 :)

github-actions[bot] commented 1 year ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.