syavorsky / comment-parser

Generic JSDoc-like comment parser.
MIT License
239 stars 23 forks source link

Example suggestion: How to configure tokenizers #115

Open jaydenseric opened 3 years ago

jaydenseric commented 3 years ago

So, I'm trying migrate jsdoc-md to the comment-parser v1.x API, but can't find any documentation about how to configure tokenizers for parsing standard JSDoc tags. Here is the old code:

https://github.com/jaydenseric/jsdoc-md/blob/98effb0b4d45af041e8ce91d6659512b53cbdfbb/private/jsdocCommentToMember.js#L3

Ideally the example will use deep require paths to just the functions needed (vs getting things from index files), for a minimal memory footprint and bundle size.

syavorsky commented 3 years ago

I believe you have already found this example. It is not clear how to get tokenizers though as you pointed out

CommonJS option

const {default: tag} = require('comment-parser/lib/parser/tokenizers/tag')
const {default: type} = require('comment-parser/lib/parser/tokenizers/type')
const {default: name} = require('comment-parser/lib/parser/tokenizers/name')
const {default: description} = require('comment-parser/lib/parser/tokenizers/description')

or ES6 imports, it might work for you with the right tooling. I still have to tune ES6 for distribution

import tag from "./es6/parser/tokenizers/tag"
...

take a look on what existing tokenizers do, I don't have a written guide on implementing one yet. See

jaydenseric commented 3 years ago

For other people migrating from v0.x to v1, here is a jsdoc-md diff to reference:

https://github.com/jaydenseric/jsdoc-md/commit/ebcdbf5867851f5251f88a7759cfee6095251f01#diff-04e9c1ef0da56feec83db032c6ee7f9db1230accf7530a0767902d21be9faf85

One thing I would like to investigate is if the comment-parser tokenizer can be configured to skip work for JSDoc tags not in an arbitrary whitelist. This will prevent it doing work to try to tokenize JSDoc tags we're not interested in, that most of the time don't fit the default behavior that expects a type, name, and description to be present. @syavorsky is there a way to do this?

The next jsdoc-md version is currently a work in progress, but I have a really huge amount of work locally nearly ready to push up and hopefully publish today. The CLI will display syntax highlighted ranges of problematic JSDoc code right in the terminal for errors.

As I mentioned here, I've been working the past few weeks on a brand new JSDoc comment parser package, that has source location data for every node in the JSDoc AST (relating both to just the doclet, and the whole code file). I got about 80% of the way there, but now that comment-parser v1 is out and it makes it possible (with a bit of manual work) to figure out line and column numbers for JSDoc block tag spans, I couldn't justify spending another few weeks on my own solution. Frankly it was tiring me out! @syavorsky I appreciate your work :)

Once the next major version of jsdoc-md is published I will share my utility function that extracts source line and column numbers for JSDoc block tag spans for a given span token name, i.e. tag, name, type, or description.

syavorsky commented 3 years ago

great, I am trying to make comment-parser a flexible low-level parser for tools like this.

One thing I would like to investigate is if the comment-parser tokenizer can be configured to skip work for JSDoc tags not in an arbitrary whitelist.

There is no such thing. Parser is processing entire source over few stages. I didn't do any benchmarking, but would be interested to find what input data volume would show any noticeable performance boost for proposed optimization.

It may get tricky though if you would need to stringity data back. For that you would need to iterate over Block.tags[].source instead of Block.source, which would need minor API tweaks (UPD: created #118 )