Closed savetheclocktower closed 1 week ago
I should add that the new indentation functionality doesn't have any tests yet, but neither does it change how existing stuff behaves, so I expect all tests to pass. If I don't manage to write tests for this stuff in the next three weeks, I might take some small steps to hide the new functionality, or at least mark it as obviously experimental and not something that should be relied upon.
(Update: we now have a spec for @match.next
, so I consider this stuff to be stable enough to ship. At this point, any further changes that I make to how @match.next
behaves can be done in a backwards-compatible way; so all that's left to do is document it, which I plan to do soon.)
Just bumped our web-tree-sitter
to 0.23.0. This is a big bump and I'd been putting it off because of disruptive changes to the Tree-sitter ecosystem; but the short version is that everything's cool.
Immediately after I bumped web-tree-sitter
locally, I reloaded my Pulsar window and found that everything was heartbreakingly slow. Profiling had me chase down a couple of red herrings before I discovered that the problem had to do with API changes in web-tree-sitter
. This happened a while back for a good reason: the desire to eliminate differences between the web-tree-sitter
API and the node-tree-sitter
API.
The slowness had to do with the fact that every query against the tree was accidentally being executed against the entire tree rather than against the specified range. (Even this was fast on the web-tree-sitter
side; the slowness happened in our post-processing phase, the one in which we iterate through all the captures.) Once I updated our usages of Query#captures
to use its new signature, the problem went away. (One of the red herrings was still a good idea: a hot code path was allocating lots of unnecessary arrays, so I rewrote it to use a different approach.)
We can monkeypatch Query#captures
to work with both the old function signature and the new one, so that's exactly what I did. The purpose is not so much to save us from rewriting our usages (since I've done that; it's in this PR) but to save the editor from throwing an error if a community package expects the old function signature. (On that note, I've also updated tree-sitter-tools to be more defensive and work equally well with the old API and the new API.) We'll issue a deprecation warning if the old function signature is detected.
Everything went well locally, including specs, so I'll watch the CI carefully just in case there's something really obscure that regressed.
Enough last-minute changes; time to take this out of draft so that I stop clipping at the edges like it's a bonsai tree.
Just kidding! I ran into a strange WASM memory error when working on an EJS document, and I couldn't figure out why it was happening, so eventually I decided “might as well update the relevant Tree-sitter parsers” and that solved it.
Along the way I discovered another subtle bug, so that's also been dealt with.
This one starts off big — a sizable diff that probably overstates the scope of the underlying refactor.
Indentation logic is complex. It's not as complex as it seems, but it's still pretty complex, and I haven't exactly helped the issue because I haven't found an intuitive way to explain it. The first commit in this PR represents the beginning of an effort to demystify indentation queries and make them easier for others to reason about.
The main initial benefit is that indentation logic is now encapsulated in its own
IndentResolver
class. UnlikeFoldResolver
— a different instance of which exists for eachLanguageLayer
— aWASMTreeSitterLanguageMode
instance has a single instance ofIndentResolver
. That's because the logic of indentation hinting crossesLanguageLayer
s; we don't gain anything by subdividing the labor further.But the encapsulation itself is useful. It lets us write new methods to reduce repetition without increasing the API surface of
WASMTreeSitterLanguageMode
.There are two other ideas that I've added in the process:
The
@match
capture is good, but a recent bug report suggested to me that we could benefit from a similar kind of capture that works in the first phase of indentation hinting. So I created@match.next
; it's a different approach to phase-one hinting from what you get with@indent
/@dedent
/@dedent.next
captures.“Phase one” of indentation hinting is where we determine what the “baseline” indentation is for line X before its own content is considered; we look at the content of line X-1 and decide whether line X should start with the same indentation level, one level more, or one level less. I'll illustrate its usefulness below — much easier when I'm not in an unordered list.
For a long time I've wanted to make the indentation decision process more visible. It's not something anyone needs to care about… until they're writing a modern Tree-sitter grammar or trying to figure out why a grammar someone else wrote isn't making the right indentation decision. When I've found myself in this position, I've had no choice but to open dev tools and set some debugger breakpoints.
Now I'm experimenting with a different approach: an obscure API that, once an indentation decision is made, fires an event with metadata that could help a user understand the logic behind an indentation decision. The
IndentResolver::onDidSuggestIndent
method accepts a callback and provides an object with a bunch of properties. It's still a bit intimidating to consume directly, but my eventual goal is to write a community package that can interpret the data (or else add such functionality totree-sitter-tools
).About
@match.next
The impetus for
@match.next
was this example:This is a contrived example, but the idea is that a user should be able to insert whatever sort of hanging indent they want when they're spreading a statement over multiple lines. Since we have Tree-sitter, we should be able to understand that line 4 shouldn't keep the deep indentation of line 3; it should match the indentation level of line 2!
I didn't think about this until a user reported an issue with hanging indents in C. I came up with a fix that's most of the way to what I want, but it uses a
@match
capture, so operates in the second phase of indentation hinting — the phase that considers the content on the current line. For this reason, it doesn't let us correct the indentation until after the user starts typing.With
@match.next
, that fix looks a bit different:I'm still playing around with this, and I'm only 96% sure it's a good idea. But this could be expressed as follows: “the presence of
expression_statement
on row X-1 suggests that row X should start with the same level of indentation as that of the row whereexpression_statement
started.” (Also true forreturn_statement
and all other node types within the square brackets.)Hence in the example above:
expression_statement
, so it produces a@match.next
capture.expression_statement
didn't end on row 3. But theindent.matchesComparisonRow
test passes, so we proceed.suggestedIndentForBufferRow
interprets the@match.next
capture by using theindent.matchIndentOf
predicate to figure out which line's indentation should be copied. In this case, it usesstartPosition
— i.e., the starting position of theexpression_statement
node.I still sense that I'm not explaining this well enough. I'm not certain that these captures have the right names; one further advantage of the
IndentResolver
encapsulation is that it would make it a bit easier to introduce new aliases for these captures and predicates while preserving backward compatibility. (For instance,indent.matchIndentOf
is now aliased to the terserindent.match
, andindent.offsetIndent
is aliased toindent.offset
.)Another reason to introduce this new encapsulation is to give us a place to hang new indentation-specific query tests. In order to implement what I described above, I needed a way to introduce the concepts of “current row” and “comparison row” to query tests, and those are indentation-specific ideas. (The current row is the row whose suggested indentation we're determining; the comparison row is the one we're using as reference; typically it's the row just above the current row, but we tend to skip over whitespace-only rows.)
Anyway, if any of this doesn't make sense, and you want to understand it better, please ask! In the process of answering your questions I hope I find ways to make this subject less intimidating.
In other news: a fix I made for
tree-sitter-css
months ago has been accepted, so I've also bumped ourtree-sitter-css
to the latest release.Changelog
web-tree-sitter
to version 0.23.0.tree-sitter-css
to the latest version.tree-sitter-markdown
to the latest version.tree-sitter-html
andtree-sitter-embedded-template
to their latest versions.tree-sitter-javascript
to the latest version.tree-sitter-typescript
to the latest version.@match.next
capture for advanced control of how indentation should change from one line to the next.Added a new event that is emitted when Pulsar makes an indentation decision, with the ultimate goal of making indentation hinting easier to understand for grammar authors. The metadata on the event is preliminary and subject to change.(Maybe I'll mention this somewhere, but probably not in the changelog, since it's not part of a public API yet.)indent.matchesComparisonRow
andindent.matchesCurrentRow
for comparing arbitrary positions in a Tree-sitter node tree to the operative rows in an indentation suggestion query. Makes it possible to say things like “decrease the indent on line 10 if a statement ends on line 9.”indent.matchIndentOf
andindent.offsetIndent
toindent.match
andindent.offset
, respectively. The old names still work as aliases.