Flexible parsing of param/typeParam

rbuckton commented 4 years ago

Improves parsing for @param and @typeParam blocks to handle the following cases:

 /**
   * // optional hyphens
   * // [tsdoc-param-tag-missing-hyphen]
   * @param foo This is fine.
   *
   * // JSDoc-style type annotations in various positions (with or without hyphens)
   * // [tsdoc-param-tag-with-invalid-type]
   * @param {boolean} foo This is fine.
   * @param foo {boolean} This is fine.
   * @param foo {boolean} - This is fine.
   * @param foo - {boolean} This is fine.
   *
   * // JSDoc-style optional parameter names
   * // [tsdoc-param-tag-with-invalid-optional-name]
   * @param [foo] - This is fine.
   * @param [foo=default] - This is fine.
   */

For each of the above cases, a warning is issued (and can be ignored via configuration). In the case of the missing hyphen, this fixes the case where the parameter name was not included in the parameter description.

When parsing out a JSDoc-style type or value (in the case of a parameter default), the parser handles balanced brackets/braces, quoted strings, and escaped quote-marks.

Fixes: #128

rbuckton commented 4 years ago

@octogonz, @iclanton: Just a friendly ping. It's been over 2 months since this PR was opened.

octogonz commented 4 years ago

@rbuckton Sorry for the slow response. Looking at it now...

octogonz commented 4 years ago

The optional name rule seems to be losing tokens. Consider this input:

/**
 * Example 1
 *
 * @param [n - this is
 * the description
 * 
 * @public
 */

It correctly reports tsdoc-param-tag-with-invalid-optional-name, but the resulting AST is missing the text this is the description. And the @public tag doesn't make it into the AST at all.

The TSDoc parser generally aims for every input character to be associated with some AST node somewhere in the parse tree, even if the input character caused an error to be reported.

octogonz commented 4 years ago

Also this input produces two errors for the same character, based on different interpretations of {:

/**
 * Example 2
 *
 * @param { test
 */

(4,11): The @param block should not include a JSDoc-style '{type}'
(4,11): Expecting a TSDoc tag starting with "{@"

Generally the parser tries to avoid this by consuming the token after reporting an error. If there isn't a natural AST node to associate it with, we can create a DocExcerptText node.

octogonz commented 4 years ago

I will see if I can fix these issues...

rbuckton commented 4 years ago

I've almost finished fixes for them.

rbuckton commented 4 years ago

The problem with the second case is that we parse { through the end of the comment as an invalid type, and then we get to the point where we error on having an invalid name and then backtrack to the start of the comment and it gets re-parsed again later.

octogonz commented 4 years ago

The issue seems to be that the JSDoc junk is not being registered as excerpts of DocParamBlock. The original anatomy of DocParamBlock looks like this:

/**
 * @param someName - The description
 */

Parsed as:

A     |B|D       |F|G|H|I
@param| |someName| |-| |The description

A - (inherited DocBlock.blockTag which is a DocBlockTag)
B - DocParamBlock._spacingBeforeParameterNameExcerpt
D - DocParamBlock._parameterNameExcerpt
F - DocParamBlock._spacingAfterParameterNameExcerpt
G - DocParamBlock._hyphenExcerpt
H - DocParamBlock._spacingAfterHyphenExcerpt
I - (inherited DocBlock.content which is a DocSection)

The JSDoc expressions introduce new characters that need to be represented somehow. Your PR starts us down the road of "lax" parsing for legacy notations. It's slightly awkard because junk characters can appear anywhere, whereas the current AST expects tokens to fit into a well-formed structure.

Also note that excerpts determine syntax highlighting, so we do need to add these junk tokens to the AST tree somehow.

I'm proposing we introduce a term "cruft" for naming excerpts that are ignored junk. We can give them all the same color (e.g. gray), ignoring any legacy meanings.

So we'll add two new "cruft" excerpts to DocParamBlock like this:

/**
 * @param {string[]} [someName=John Doe] - The description
 */

A     |B|C           |D       |E         |F|G|H|I
@param| |{string[]} [|someName|=John Doe]| |-| |The description

  A - (inherited DocBlock.blockTag which is a DocBlockTag)
  B - DocParamBlock._spacingBeforeParameterNameExcerpt
* C - DocParamBlock._cruftBeforeParameterNameExcerpt
  D - DocParamBlock._parameterNameExcerpt
* E - DocParamBlock._cruftAfterParameterNameExcerpt
  F - DocParamBlock._spacingAfterParameterNameExcerpt
  G - DocParamBlock._hyphenExcerpt
  H - DocParamBlock._spacingAfterHyphenExcerpt
  I - (inherited DocBlock.content which is a DocSection)

This feels not ideal. The right solution is the AST redesign that we discussed before. But until then, we should probably try to work within the existing parser design.

rbuckton commented 4 years ago

I just pushed an update that should pretty much do that.

octogonz commented 4 years ago

@rbuckton Thanks very much for implementing this feature! It's really useful!

microsoft / tsdoc

Flexible parsing of param/typeParam #206