Closed resolritter closed 2 years ago
My first attempted implementation was using regular expressions, but, unfortunately, the examples were not passing (https://travis-ci.org/github/tree-sitter/tree-sitter-rust/builds/753422004#L413). My regular expressions seemingly had trouble capturing the pattern /
/
\n?
; the third character in the sequence is what's decides between a line or doc comment, so I thought it made more sense to use externals for this in order to leverage lexer->lookahead
probes.
Therefore, I've reworked the approach to use externals (commented in https://github.com/tree-sitter/tree-sitter-rust/pull/99/commits/be26083df01c81079aa706bf06ffd511825c6011).
Since doc comments are generally multi-line, I've made the ///
contiguous lines be parsed as a group instead of a node per line. That should make it easier to extract the whole block.
It'd be useful to have a node without the leading ///
, leaving only the inner content, so that the Markdown parser could be injected... I don't know if it's possible to do that currently.
@resolritter
Is there any update on this one?
What's the current status on this? Is any help needed?
Ping ?
What about #
within doc comments code blocks?
I do not have merge access to this repository. This is not the only PR in review limbo right now.
@dcreager is there someone maintaining tree-sitter-rust at github right now?
It would be great if the rust grammar had more maintainers (for example tree-sitter-haskell has been doing great since tek took the reigns). Both this, #105 and #85 are PRs I've been looking forward to.
@resolritter can you rebase on the latest and I'll approve the PR.
The current implementation is incomplete because Rust requires strictly three slashes for doc comments, any more than that and it turns back to a normal line comment. This is explained in the current documentation for comments: https://doc.rust-lang.org/reference/comments.html#examples.
Moving this to Draft while I try to fix that.
Edit: Should be fixed
need to rebase, because I've merged your other PR. Having those parser.c in the repository is annoying.
Rebased and also added support for //!
as per https://github.com/tree-sitter/tree-sitter-rust/pull/99#discussion_r780672464
This PR seems to have introduced a bug where the scanner can get into an infinite loop. I'm seeing the Tree-sitter test suite hang after updating tree-sitter-rust
to the latest master. I guess the test coverage in this repo wasn't sufficient to protect against this. I'm going to revert this for now.
It looks like the infinite loop was happening during some randomized mutation of the Doc comments
test. Maybe an unterminated doc comment at the end of the file or something?
@resolritter Feel free to do a new PR if you can get this to avoid an infinite loop.
I also have questions about the need to do this using the external scanner. Couldn't you just do this in the grammar?
grammar({
// ...
rules: {
// ...
doc_comment: $ => token(choice(
// exactly three leading slashes
seq('///', optional(/[^/].*/)),
seq('//!', /.*/),
)),
// any number of leading slashes other than three, which would produce a doc comment.
line_comment: $ => token(seq(
'//', optional('//'), /.*/
)),
}
});
Of course, this would not join all of the adjacent doc comments into one continuous node - you'd get one node per comment. I think this might be better though: I think it would make it easier to determine what ranges of text contain the documentation itself, because you wouldn't have to deal with leading whitespace. I also just think that it retains more information to provide a node for each comment, and it's somewhat "lossy" to group them all into one node.
I'm still open to the other approach though, if people have strong feelings that it's more useful to get a single node.
Whatever happens, I promise we won't wait a year to merge this time.
@maxbrunsfeld
I also have questions about the need to do this using the external scanner. Couldn't you just do this in the grammar? Of course, this would not join all of the adjacent doc comments into one continuous node - you'd get one node per comment
Having the whole text in a single node is why it was done this way. What would be the alternative for highlighting the code below?
/// ```
/// use foo::Foo;
/// let bar = Foo::new("foo");
/// ```
I can only infer the following steps:
Having the text in a single node gets rid of steps 1 and 2. Or do you see a more efficient way to go about that? Or do you think having to traverse the tree in order to collect the text is not a problem?
It looks like the infinite loop was happening during some randomized mutation of the Doc comments
How can I try this randomization when testing locally?
I also just think that it retains more information to provide a node for each comment, and it's somewhat "lossy" to group them all into one node.
This is a good point. It is definitely "lossier" than other nodes since it also includes the leading whitespace for contiguous lines.
What's being gained by this approach is the ease of fetching the whole content directly, at the cost of less precision for the ranges. Feel free to close #128 if you feel like it isn't a good tradeoff.
What would be the alternative for highlighting the code below?
I think there are challenges either way, but it is more straightforward if you have a separate node for each comment.
Copying the doc comments' text into a separate buffer is not an option - we need to parse the code in place so that the positions of the nodes in the nested syntax tree correspond correctly to the original file. So what we need to do is to retrieve a list of ranges from the original file that should be parsed, together, in a nested language (markdown). We can then parse the contents of those ranges using Tree-sitter's set_included_ranges
API.
If we have a separate node for each comment, then we need to
((doc_comment)+ @doc)
///
or //!
)If, on the other hand, we have one giant node, then we need to:
I was not aware that (query)+
existed. This seems to work:
> ./node_modules/.bin/tree-sitter query <(echo -e "((line_comment)+ @doc)") <(echo -e "// foo\nfoo;\n// foo\n// foo")
/dev/fd/61
pattern: 0
capture: 0 - doc, start: (0, 0), end: (0, 6), text: `// foo`
pattern: 0
capture: 0 - doc, start: (2, 0), end: (2, 6), text: `// foo`
capture: 0 - doc, start: (3, 0), end: (3, 6), text: `// foo`
Since this use-case is supported by the query API, I am fine with closing #128.
What are those tree-sitter test suites that catch the regression? Could we run them in the CI of tree-sitter-rust?
closes #88