oxc-project / oxc

⚓ A collection of JavaScript tools written in Rust.
https://oxc.rs
MIT License
12.5k stars 458 forks source link

feat(ast,parser): parse jsdoc #168

Closed Boshen closed 7 months ago

Boshen commented 1 year ago

I see there are some demand for reading jsdoc information, let's do this in two steps:

  1. parse jsdoc into structured data, this should be lazy.
  2. save them into trivias and have them accessible. I haven't figured out the best way to do this, but see https://github.com/Boshen/oxc/discussions/6 for disucssion.

Note: This is for parsing jsdoc comments to structured data. This is not about wiring up all the jsdoc comments into a document tree (which is the main purpose for jsdoc).

To parse jsdoc comments, we need to:

  1. add a new routine to parse jsdoc in oxc_parser, this should be a separate routine so we can parse jsdoc lazily
  2. add the jsdoc AST to oxc_ast
  3. setup jsdoc tests (I'm unsure on how to do this properly yet).

References:

Boshen commented 1 year ago

Prioritizing because @thepassle need this. See https://twitter.com/passle_/status/1637446645104164865

Boshen commented 1 year ago

Update:

The original intent was to come up with a generic comment attachment algorithm, where we attach comments to some AST node driven by a set of rules. After some research and reading about it in the ESLint and Babel codebases, I conclude that it is incomprehensible for me right now.

So instead, let's reduce the problem space down to "attach jsdoc comments to specific AST node". This will be much more approachable.

We will build the jsdoc data structure inside the semantic analyzer. When a jsdoc targeting AST node such as a Statement or Declaration is visited, we will ask Trivias to get us a leading comment. The jsdoc data structure will save the comment span and mark the Statement (SemanticNode) to indicate it has a jsdoc. A getter will be provided to parse the jsdoc lazily in a OnceCell.

ematipico commented 1 year ago

I want to help somehow. What do you need or what can I do? :)

Boshen commented 1 year ago

I want to help somehow. What do you need or what can I do? :)

Hi Ema! I've forgotten everything we wrote about jsdocs, but would you be interested in getting a https://github.com/gajus/eslint-plugin-jsdoc rule to work? I can guide you on the missing pieces in the discord channel. I think we had some of the infra working, targeting a lint rule would be the easiest to fill in the gaps.

leaysgur commented 10 months ago

Summary as a pointer to someone in the future...

Prerequisite - JavaScript comments do not appear in AST nodes - They are collected as Trivia(= trival things) in Semantic and lazily built JSDoc comment is identified when - It is multiline block comment - = Starts with `/*` and ends with `*/` - And its contents starts with `*` - = `/**` instead of `/*`

There are 2 main parts to do to implement #1170 .

Current status

Just a few of examples failing to attach... ```js /** x */ // <- 2) But this is targeted and it fails because non-whitespaces are between them /** y */ // <- 1) This should be attached, let y; /** y */ // <- This should be attached, but it fails because non-whitespaces are between them // foo let y; /** x */ let x; // <- This is attached correctly with `/** x */` /** y */ let y; // <- This also tries to target `/** x */` and fails /** noop */ ; let x; // <-/** noop */ is attached but it shouldn't ``` Maybe more and more.

Misc materials

lukeed commented 9 months ago

I'd love to be able to just see & access comment values instead of skipping them outright.

I'd definitely consider it a bonus step for oxc to parse any JSDocs and translate its meaning(s) to the AST, but there are lots of instances where users either invent directives or just use a standard comment to transfer additional information

Quick examples:

/** @table "users" */
type User = { ... }

import(/* webpackChunkName: "my-chunk" */ "foobar");

// comptime
const DAY = ms(1, "day")
leaysgur commented 9 months ago

We talked a bit about this in #2437 ...,

So, I spent these days reading through the code.

(There are 3 articles on my blog if you're interested. Sorry it's in Japanese.)

As a result, although IMO, I'm not sure we should aim for 100% compatibility for this.

For a number of reasons,

@Boshen Sorry for the long lines, then I'd like to confirm,

What do you think? 👀

Boshen commented 9 months ago

@leaysgur Oh wow I didn't expect 3 blog posts on this topic, I thought jsdoc is a solved problem 😰

So in summary, it seems like the hardest part about jsdoc is comment attachment to AST nodes.

We can leave this part out and focus our task on just jsdoc content rules, which is quiet easy as all we need to do is finish the jsdoc parser and run rules against these parsed jsdocs.

then I'd like to confirm

The intention was to pass all the tests, but we don't really need to do it if it's not a fun task.


And if you're sick of jsdoc after looking at it for 3 days ... you may also join me on the eslint-plugin-import task.

leaysgur commented 9 months ago

For future reference, let me elaborate.

the hardest part about jsdoc is comment attachment to AST nodes.

This wasn't particularly difficult, at least if we trust the logic of eslint-plugin-jsdoc, or rather, jsdoccomment.

https://github.com/es-joy/jsdoccomment/blob/6aae5ea306015096e3d58cd22257e5222c54e3b4/src/jsdoccomment.js#L283

To put it simply, it was just this:

// findJSDocComment(node, sourceCode): Comment | null;
const beforeTokens = sourceCode.getTokensBefore(node, { includeComments: true });
while (let token = beforeTokens.pop()) {
  if (token.type === 'Block' && token.value.startsWith('*')) return token;
}

return null;

And I believe this behavior was already implemented in #2437 .


The part I found to be the hardest was that some(about half of) rules can:

if (
  esquery.matches(
    astNode,
    // from rule config
    `MethodDefinition:not([accessibility="public"]):has(JsdocBlock)`
  )
) {
  const jsdocComment = findJSDocComment(astNode, sourceCode);
}
const jsdocAstNode = toESTreeLikeAST(jsdocComment);
if (
  esquery.matches(
    jsdocAstNode,
    // from rule config
    `JsdocBlock[postDelimiter=""]:has(JsdocTypeUnion > JsdocTypeName[value="Bar"]:nth-child(1))`
  )
) {
  ruleHandler({ astNode, jsdocAstNode });
}

https://github.com/gajus/eslint-plugin-jsdoc/blob/main/docs/advanced.md

(Actually, they seemed to have a bit more complicated logic...)


About rest half of the rules seem to simply check "only the text of all JSDoc comments in the source".

However, their implementation was like

for (const { astNode, jsdocAstNode } of jsdocNodesWithAttachedNode)
  // Why astNode required...??
  ruleHandler({ astNode, jsdocAstNode });
// ...
for (const { jsdocAstNode } of jsdocNodesWithoutAttachedNode)
  ruleHandler({ astNode: null, jsdocAstNode });

they are called differently for some reason.

https://github.com/gajus/eslint-plugin-jsdoc/blob/e948bee821e964a92fbabc01574eca226e9e1252/src/iterateJsdoc.js#L2279-L2328

And when I tried to replace astNode of the former to null, the tests started to FAIL...! 😇

💡 After additional research, I finally solved this mystery.

It seems that these 3 rules perform extra linting if node exists. Other all rule's tests pass without node!


you may also join me on the eslint-plugin-import task.

That sounds interesting as well.

Either way, I'll think a bit more about how to conclude #2437 .

Boshen commented 7 months ago

@leaysgur is carrying this task 👍 . Future issues can be created separately now.