phpstan / phpdoc-parser

Next-gen phpDoc parser with support for intersection types and generics
MIT License
1.31k stars 62 forks source link

support comments in array shape #213

Open shmax opened 12 months ago

shmax commented 12 months ago

This is my first experiment in trying to support comments in array shape:

array{
    // a is for apple
    a: int,
}

array{
    // a is for apple
    // a is also for aardvark
    a: int,
}

Note: as pointed out by @ondrejmirtes , multi-line comments within a block comment aren't valid PHP, so single line comments like this are the best we can do.

shmax commented 12 months ago

I'm having some trouble negotiating with your CI build. I'm getting this error:

Time: 00:03.819, Memory: 80.00 MB

There was 1 PHPUnit test runner deprecation:

1) Your XML configuration validates against a deprecated schema. Migrate your XML configuration using "--migrate-configuration"!

--

There was 1 failure:

1) SlevomatCodingStandard\Sniffs\Commenting\ForbiddenAnnotationsSniffTest::testForbiddenAnnotations
Failed asserting that 9 is identical to 10.

/home/runner/work/phpdoc-parser/phpdoc-parser/slevomat-cs/tests/Sniffs/Commenting/ForbiddenAnnotationsSniffTest.php:24

FAILURES!
Tests: 718, Assertions: 2648, Failures: 1, Deprecations: 1.

So the tests are failing, but the reported error is some kind of slevomat thing? 😕

ondrejmirtes commented 12 months ago

This is the workflow that fails in your PR: https://github.com/phpstan/phpdoc-parser/blob/1.23.x/.github/workflows/test-slevomat-coding-standard.yml

Slevomat CS is a heavy user of phpdoc-parser so we're testing every phpdoc-parser commit against Slevomat CS, to see that we didn't break anything.

The test in question:

shmax commented 12 months ago

Slevomat CS is a heavy user of phpdoc-parser so we're testing every phpdoc-parser commit against Slevomat CS, to see that we didn't break anything.

Oh, that's clever, so you copy the current source of phpdoc-parser into slevomat's own vendor/phpstan folder, then you see what happens, eh? And because I'm now supporting comments within doc blocks, some test it runs related to "forbidden annotations" is getting throw off?

Any idea how to proceed, here?

shmax commented 12 months ago

An, yes, I had the tests commented out while I was experimenting with the node-based comment approach. Restored, thank you.

ondrejmirtes commented 12 months ago

Any idea how to proceed, here?

You should probably replicate what the workflow does locally, which will allow you to debug what's going wrong in the test. It might actually be valid, and we can just update the assertion in Slevomat CS after merging and releasing this PR. But it needs to be investigated.

shmax commented 12 months ago

You should probably replicate what the workflow does locally, which will allow you to debug what's going wrong in the test.

Great idea, thanks.

shmax commented 12 months ago

Heh, well hurray for debugging. It turns out that the comment-parsing I was doing was getting thrown off by this kind of thing:

* @see https://www.slevomat.cz

I've modified my regex to include a negative lookbehind on ":", and that seemed to do the trick. Thanks much for the insights.

I guess that just leaves the format-preserving stuff, which I will look at this evening.

ondrejmirtes commented 11 months ago

TBH I'd first make single line comments work perfectly,and only if there is a real demand add multiline comments. We're already inside a comment, we don't need our comments to support multiline comments 😀

shmax commented 11 months ago

Fair point, but it was actually fairly easy to get working as all the hard work happens in Comment::getReformattedText, which I was able to lift unchanged from nickic. We'll see, though...

shmax commented 11 months ago

With my latest changes, multiline comments seem to work with format-preserving printer:

/**
 * @return array{
 *  c: string,
 *  /* This is a
 *     multiline
 *     comment
 *   */
 *  d: string,
 *  /* Another
 *   * multiline
 *   * comment with leading asterisks
 *   */
 *  e: string
 * }
 */

I also have the DiffElem::TYPE_ADD branch of Printer::printArrayFormatPreserving working.

I think that just leaves the $delayedAdd processing (which I haven't gotten as far as understanding, yet), and possibly removal of nodes, but I didn't see any removal stuff on array shapes already happening in the printer tests, so let me know if you want it.

shmax commented 11 months ago

Oh, I also reverted the trailing comment style:

/** 
 @return array{
    a: int // comment
  }

Nikic parser doesn't seem to support it, and I suppose it could arguably be considered ambiguous (does it go with the item to the left, or the item on the following line?), so I figure it's best to leave it for another day.

shmax commented 11 months ago

Took a shot at implementing delayed add. At this point I think I'm ready for a round of reviews. I'm sure there are going to be a lot of use cases I missed, so let me know if you have any suggestions on how to improve the tests.

ondrejmirtes commented 11 months ago

I've got an idea - every $tokens->tryConsumeTokenType(Lexer::TOKEN_PHPDOC_EOL); in TypeParser should be replaced by a special method on TokenIterator that would return an array of comments which we could use in enrichWithAttributes. Because comments only make sense at the end of the line, it makes sense to pair it with consuming EOL.

shmax commented 11 months ago

I've got an idea - every $tokens->tryConsumeTokenType(Lexer::TOKEN_PHPDOC_EOL); in TypeParser should be replaced by a special method on TokenIterator that would return an array of comments which we could use in enrichWithAttributes. Because comments only make sense at the end of the line, it makes sense to pair it with consuming EOL.

Sure, can try something like that, but that's one step removed from just doing whatever you're going to do directly in enrichWithAttributes, isn't it? And I'm not sure that a comment is always at the end of the line, for example:

/*****************************
 * @param /* TODO: remove */  array $data
 */
shmax commented 11 months ago

Moving this back to draft, as I'm discovering general problems around format-preserving of single line vs inline comments. Hopefully won't take too long to sort out...

shmax commented 11 months ago

general problems around format-preserving of single line

False alarm. I confirmed behavior against Nikic parser, and the behavior with comments in this branch seems to be correct. I had the idea that inline comments should stay inline when format-preserving printing, but apparently not; all comment types force a newline after a comment, and then another after the item it is paired with.

ondrejmirtes commented 11 months ago

About @param /* TODO: remove */ array $data - I don't think we want to support this because the comment is not part of the type, it precedes it. And from this it's also obvious why I want to start by supporting end-line // comment only :)

shmax commented 11 months ago

About @param / TODO: remove / array $data - I don't think we want to support this because the comment is not part of the type, it precedes it. And from this it's also obvious why I want to start by supporting end-line // comment only :)

I'll do whatever you want to move things forward, but the sample we're discussing here doesn't really have an impact on the array-based stuff I've been working on (I've confirmed over and over that those are working perfectly in alignment with nikic parser, for both single-line and block comments).

I'm not sure what you expect to happen for /* TODO: remove */ $array sample, but from my testing it does indeed get bound to the following item, in this case the identifier. The comment doesn't break anything, but the format-preserving printer doesn't currently know how to handle these kind of comments. I can continue to investigate that if you like...

edit: to clarify, such a comment does appear in the printer output, but not if you "edit" the identifier node in this use case to include a new comment. I'm having trouble finding an equivalent sample in the nikic parser tests to analyze, but its format-preserving printer has a few branches that yours doesn't (eg. pStmts).

ondrejmirtes commented 11 months ago

/* ... */ inside /** PHPDoc is not even valid PHP: https://3v4l.org/R7jBd

So please remove that from the parser along with its complexity and leave only the trailing // comment. Thank you.

shmax commented 11 months ago

Well, that's certainly a fair point! I don't think it will reduce complexity at all, but obviously if it breaks PHP it will have to go. Let me see what I can do...

shmax commented 11 months ago

Here you go: bfa599138b9a097073663319cb17a31d4e7f402e

ondrejmirtes commented 11 months ago

What about all of this?

Screenshot 2023-10-05 at 15 53 42

shmax commented 11 months ago

Sure: 3b6ff9a

ondrejmirtes commented 11 months ago

I'm sorry that you feel this way and that you're not going to finish this PR. But I need to do what I feel is right for this project and its users, and also to keep the support burden low and maintainability high.

shmax commented 11 months ago

Sigh. I forgot I can't just fork the parser, because you have everything all wrapped up in a phar.

ethernidee commented 10 months ago

@shmax, multiline comments would be really great feature. We are grateful to you for all the efforts.

shmax commented 10 months ago

@ethernidee thanks, I appreciate the words of encouragement. Let me summarize the situation:

  1. I opened a PR with a very simple solution, which was to ignore array shape comments
  2. @ondrejmirtes decided that if we are going to have comments, then we must do them the same way that nikic parser does them, and that comments must work everywhere, not just in array shape.
  3. I got that working, and now @ondrejmirtes has declared that they should NOT work as one would expect when used in lines with tag names, because that would not be "backwards compatible"

I can't spent the rest of my life trying to hit a moving target, so I decided to just document my own code the old-fashioned way (mkdocs), but I may return to this at some point if I get bored. In the meantime, feel free to take this code and open your own PR.

ethernidee commented 10 months ago

To my mind, multiline comments are necessary everywhere. Array shape, object shape, regular tags. It's normal practice to be able to describe something using ascii-graphics, a few sentences or some kind of a list. The same way, as regular comments work.

shmax commented 10 months ago

multiline comments are necessary everywhere

Unfortunately, we've discovered that multiline comments aren't legal inside a doc block comment (PHP can't parse them).