syavorsky / comment-parser

Generic JSDoc-like comment parser.
MIT License
239 stars 23 forks source link

Provide a Babel AST mapper #99

Closed syavorsky closed 3 years ago

syavorsky commented 3 years ago

Provide a way to map the parser output into Babel AST as suggested in #93.

syavorsky commented 3 years ago

@jaydenseric can you point me to the exact spec, so I know for sure what you are asking for. I couldn't spot it from quick look into babel spec doc

jaydenseric commented 3 years ago

The loc field in a Babel AST Node that has a SourceLocation type:

https://github.com/babel/babel/blob/main/packages/babel-parser/ast/spec.md#node-objects

syavorsky commented 3 years ago

@jaydenseric is this example doing what you are looking for? I am still not sure if this should go into comment-parser itself, but hopefully parsed data provides all you need just in slightly different format

jaydenseric commented 3 years ago

Thanks for looking into it.

I find that example pretty confusing…

As a side note, it seems comment-parser can't handle multiline type content?

/**
 * @param {{
 *   foo: true
 * }} foo Lorum ipsum.
 */

I'm not 100% sure, but I think it's valid and VS Code can syntax highlight it ok:

Screen Shot 2020-12-27 at 11 21 26 am
syavorsky commented 3 years ago

@jaydenseric let me rephrase my last comment. Is .source[].tokens sufficient fo constructing babel AST?

syavorsky commented 3 years ago

@jaydenseric this is all about 1.0 branch, check out its README, it should answer some of your questions above.

Having .source[].tokens you should be able to build AST for tag, name, type, etc.

The structure is

parse(source, opts) => Block[]{
  ...
  tags: Spec[]{
    ...
    source: Line[] // arr of refs to `Block.source[]`'s items
  }
  source: Line[]{
    ...
    source: string
    tokens: Tokens {...}
  }
}

all result types live in primitives.ts. Playground is using most recent 1.0 source, eventually master. I will try to improve the UI to make it less confusing

syavorsky commented 3 years ago

1.0 went to the master with all updates above. Let's return to this conversation if you find AST conversion is not feasible

jaydenseric commented 3 years ago

As promised, here is the getJsdocBlockTagSpanCodeLocation utility function I came up with in jsdoc-md v9.0.0:

https://github.com/jaydenseric/jsdoc-md/blob/v9.0.0/private/getJsdocBlockTagSpanCodeLocation.js

A JSDoc block tag "span" is a chunk of syntax that holds actual (sometimes multiline) content, vs whitespace separators. The, tag name, type, name group, and description are spans.

jaydenseric commented 3 years ago

Here is the end result:

Screen Shot 2021-01-23 at 12 23 03 pm Screen Shot 2021-01-23 at 12 23 05 pm
jaydenseric commented 3 years ago

@syavorsky how can I figure out a start and end code location for the main JSDoc comment description? Is it possible just by looking at the comment source array, given that the description tokens for all the following block tags are also in there?

This is needed to solve https://github.com/jaydenseric/jsdoc-md/issues/19 .

jaydenseric commented 3 years ago

Can we please add a descriptionSource array to the parse result?

syavorsky commented 3 years ago

Take a look at the Block.source items up to the first tag line. You can find where tags start by matching Block.tags[0].source[0].number

jaydenseric commented 3 years ago

Ok, so this took about a week of work to get right in jsdoc-md v9.1.1:

Screen Shot 2021-02-02 at 12 08 33 am

Of of several gotchas is that comment-parser includes leading newlines in a .description value, but my getJsdocSourceTokenCodeLocation would skip those source lines with the description token value of '' and therefore get the start code location too late. Instead of trying to solve this problem (how could you?) I trimmed newlines from the .description value so the code location matches:

https://github.com/jaydenseric/jsdoc-md/blob/02208056f158a2d28be4769d4070a0255b18ff7c/private/jsdocCommentToMember.js#L566-L582

This is not too bad for my use case because markdown content looks better with pointless newlines trimmed anyway. But if you needed a description to be exact for some reason (including start and end newlines), you're in trouble.

Overall, honestly speaking, it's been a nightmare trying to figure out line and column source code locations for the things comment-parser parsed. I really, really, wish comment-parser:

  1. Has a source array for main block descriptions, like tags each have (see https://github.com/jaydenseric/jsdoc-md/blob/v9.1.1/private/jsdocCommentToMember.js#L575).
  2. Had line and column, start and end code location data for every detail comment-parser parses.
  3. For things such as .description, the raw JSDoc content spanning it's code location is available, that includes the * fence in the string. This is important for example for regex searching for inline @link tags, and being able to figure out their line and column code locations based on the regex match index and what the start line and column is for the description they are in (see https://github.com/jaydenseric/jsdoc-md/blob/v9.1.1/private/jsdocDataMdToMdAst.js#L54-L58).
  4. comment-parser can parse @example <caption> and content (see https://github.com/jaydenseric/jsdoc-md/blob/v9.1.1/private/jsdocCommentToMember.js#L387-L491).
jaydenseric commented 3 years ago

Added a few more details to the last comment in edits.

brettz9 commented 3 years ago

Although it looks like ESLint-friendly AST would the same as Babel in the case of adding custom types, It would also be nice to support exporting VisitorKeys so that esquery (as used in ESLint rules, e.g., to require or prohibit certain comment structures) or estraverse could be utilized with comment-parser.

(Ideally this would also allow optional specification of a jsdoc type parser, like catharsis, jsdoctypeparser, or jsdoc-type-pratt-parser, so that in addition to the raw types, one could also get parsed types, with its VisitorKeys being reexported.)

brettz9 commented 3 years ago

As discussed in #117 , I've released https://github.com/es-joy/jsdoccomment which converts comments (with comment types) alone to Babel AST (neglected to mention here), and https://github.com/es-joy/jsdoc-eslint-parser (however inefficiently) iterates all nodes to add a jsdoc property to them containing the relevant detected comment AST.

Note that the detection of the comment for a given structure is not a trivial matter. For example, with:

/* A */
const /* B */ aFunc = /* C */ function () {};

... for the function expression, we might look for the JSDoc Block at point C first, but then if not present, look for it at point A. My parser uses such an algorithm, and this may currently result in the same jsdoc being repeated on two different nodes, e.g., if looking at the node for the aFunc Identifier, it might add a JSDoc Block at point A as well if one is found there.

(I've added this explanation to the parser README and also updated @es-joy/jsdoccomment to indicate the AST comment (and type) structure.)

FWIW, here is the basic structure:

  1. {type: 'anyESNodeType...', jsdoc: {type: 'JSDocBlock', ...}}
  2. {type: 'JSDocBlock', tags: [{type: 'JSDocTag', ...}], descriptionLines: [{type: 'JSDocDescriptionLine', ...}], lastDescriptionLine: aNumber, /* Then these unmodified comment-parser ones */ description, delimiter, postDelimiter, end}
  3. {type: 'JSDocTag', parsedType: {type: 'oneOfTheJSDocTypeParserTypes--seebelow'}, descriptionLines: [{type: 'JSDocDescriptionLine', ...}], typeLines: [{type: 'JSDocTypeLine', ...}], /* 'type' from 'comment-parser' renamed to avoid conflict: */ rawType, along with other comment-parser types besidesend}
  4. {type: 'JSDocDescriptionLine', delimiter, postDelimiter, start, description}
  5. {type: 'JSDocTypeLine', /* Renamed to avoid conflict*/ rawType, delimiter, postDelimiter, start}

And the jsdoctypeparser types behave as in jsdoctypeparser, but I have renamed their node type so that all are now prefixed with JSDocType and camel-cased, so, e.g.,INSTANCE_MEMBER becomesJSDocTypeInstanceMember.

Note that this is all fairly experimental, and may change. We may also need some pointing to where the jsdoc block actually is present, and the user's indent is not currently preserved.

But I thought the AST does at least get us started in allowing any possible precise targeting that might be desired, from individual tag lines to even multi-line types as well as descriptions--preserving all the detail comment-parser thankfully exposes. Feedback is welcome (probably best on the relevant project than cluttering discussions here).

brettz9 commented 3 years ago

Btw, I'm also thinking of removing @ at the beginning of tag in the transformed AST. I think it may be referenced too frequently to have to strip that off within selectors.