Open Xophmeister opened 1 year ago
Note: Option 3 does not work in general and requires more subtlety. For example, if you have a node that you want followed by a softline, you can't just add a (comment)?
node:
(
(my_node) . (comment)? @append_empty_softline
)
This will match whether the comment is present or not, but the @append_empty_softline
capture name will only apply when the comment exists.
Applying the capture name to the group should solve this problem, but -- anecdotally and for reasons I've yet to understand -- it doesn't always work:
(
(my_node)
.
(comment)?
) @append_empty_softline
Presumably you wouldn't want this behaviour, anyway, because the (comment)
will have its own formatting. As such, a @do_nothing
directive would probably be the correct approach:
(
(my_node) @append_empty_softline
.
(comment)? @do_nothing
)
This comes at the potential expense of factoring out other related capture names that you don't want to have cancelled. (e.g., If you have indentation directives in this query, you'd have to move them elsewhere, etc.)
Yes, but given that we don't have a more advanced solution available to us for the time being, the (comment)? @do_nothing
pattern is the way to go.
Indeed. I'm just documenting for maintainership's sake.
I guess I was triggered by "Option 3 does not work in general". Perhaps it's better to rewrite option 3 slightly, then 🙂
Does it mean that we have to use the idiom
(
(my_node) @append_empty_softline
.
(comment)? @do_nothing
)
everywhere where a newline is inserted?
Pretty much, yes.
Although that doesn't have to be quite so excessive as it sounds, because we can have a few patterns that take care of a lot of those, such as:
(
[
(attribute_item)
(enum_item)
(extern_crate_declaration)
(function_item)
(impl_item)
(let_declaration)
(mod_item)
(struct_item)
(type_item)
(use_declaration)
] @append_hardline
.
[
(block_comment)
(line_comment)
]* @do_nothing
)
@aspiwack Potentially. However, there are times where a query can be rewritten to avoid it. There may also be times when you want the newline, regardless of a comment.
extra
nodes in OCaml).I've experimented a bit to solve this issue. Here are my observations so far:
(comment)?
everywhere). I was interested to see if this was doable programmatically, i.e. by parsing the query, then enriching it later with extra (comment)?
. I don't think it's possible: the tree-sitter API doesn't allow this kind of modifications on a query.I experimented with Option 1 (processing comments in a separate stream). This seems doable, but costly. My experiments are here: https://github.com/tweag/topiary/tree/nb/engine/split_comment_stream
The only option I've found to edit an existing CST and remove comment nodes is through the tree.edit()
and parser.parse(input, old_tree)
functions, as shown here.
Currently, the code in my branch removes all comments from the input and discards them, by repeating the following until no comment is left:
node.is_extra()
and node.kind()
tree.edit()
and removing the bytes from the inputparser.parse(edited_input, edited_old_tree)
This simple operation of removing all comments from the input already has a non-negligible cost. This is an extract of the output of cargo bench --profile release
, on main
and on my branch respectively:
branch main:
format_ocaml time: [627.69 ms 630.84 ms 633.84 ms]
change: [-3.9578% -3.4334% -2.9099%] (p = 0.00 < 0.05)
Performance has improved.
branch nb/engine/split_comment_stream:
format_ocaml time: [821.88 ms 823.24 ms 824.73 ms]
change: [+29.836% +30.499% +31.193%] (p = 0.00 < 0.05)
Performance has regressed.
The 30% performance drop is already noticeable, even without doing anything with the comments I extracted.
I would like your input before proceeding further, in particular:
I haven't checked your code, so ignore me if I'm stating the obvious! Can you edit the source backwards -- i.e., starting at the comment with the greatest byte offset -- to avoid having to reparse the tree on every edit?
I don't think it's possible: the tree-sitter API doesn't allow this kind of modifications on a query.
But in theory, it should be possible to parse the query using the tree-sitter grammar, make modifications to the syntax tree, and generate a new query file, right?
Not saying this is necessarily the approach we want to take, though.
Perhaps there is a fourth possibility:
Introduce a pre-processing step to the query files to allow for something macro-like before feeding them to tree-sitter. So rather than having to add arcane variations of (comment)? @do_nothing
everywhere, we could just add %allow_comment
wherever necessary. We may still need to allow for variations of those, unless we can make the pre-processor smart enough to always understand the correct context, so it is unclear how much it would help.
I haven't checked your code, so ignore me if I'm stating the obvious! Can you edit the source backwards -- i.e., starting at the comment with the greatest byte offset -- to avoid having to reparse the tree on every edit?
I didn't know if tree.edit()
could support multiple edits before reparsing, but apparently it does!
I applied your suggestion to apply all edits at once, and I got the benchmark down to reasonable levels.
:bulb: This is not really a bug or a feature, but a consequence that we should probably address.
e.g., #425 is a manifestation of this.
Is your feature request related to a problem? Please describe. Comments and other nodes that can appear (almost) anywhere in the syntax -- so called "extra nodes" in Tree-sitter -- break simple queries that target syntactic structures for formatting. For example, if you have a formatting query that looks like this:
Then this won't match and therefore won't have formatting applied:
:bulb: This example is illustrative, to demonstrate the problem. In this case, the query could be trivially refactored to avoid the problem. However, this is not always possible.
Describe alternatives you've considered There are a few possible solutions to this problem. Including, but not necessarily limited to:
gofmt
splits the input into two streams -- comments and code -- formats them independently, then stitches them back together.(comment)?
(or equivalent) in appropriate places. This can get messy.