Indentation issues with hanging indents

Yeah, this is several things:

Adding a configurable rule for hanging indents to `language-c`

The fix I proposed here is really just adding a single new rule to the C grammar’s indents.scm that says anything that immediately succeeds an expression statement should match the indentation level of at the beginning of that statement.

(
  (expression_statement)
  .
  (_) @match
  (#set! indent.matchIndentOf previousSibling.startPosition)
)

So in the given example…

  printf("something very long %s",
            some_text);
  return 0;

…the indentation of the return line would always match that of the preceding printf line.

This seems to work, but I haven't tried all the edge cases. It works because none of the types of statements that result in the next line being indented one level fall under the expression_statement type.

The easiest fix would be to add this new line to the default indents.scm for the C grammar, but we'd definitely want to make it dependent on the user's configuration, and probably make it opt-in (disabled by default).

Possibly a better fix: a new kind of indentation capture

The fix described above has a couple of downsides:

It's aggressive, since a @match capture beats everything else.
It doesn't fix the indentation until the user starts typing.

We have one kind of indentation capture called @dedent.next whose purpose is to signify that the next line should decrease in its indentation level no matter what the content is. This is unusual because the content of the line usually matters when deciding whether it should be dedented, but it's occasionally very useful.

We could introduce an equivalent capture called @match.next — it would be to @match what @dedent.next is to @dedent. (@dedent says “this line should be dedented”; @dedent.next says “the next line should be dedented.”)

Determining indentation for a given line is a two-step process:

We determine its “baseline” indentation by analyzing the previous row (usually the previous row with actual content on it) to see if anything in that row suggests that the next row should start out indented.
We look at the content of the line itself to see if it suggests that the row itself should be dedented relative to that baseline.

An ordinary @match capture is analyzed in step 2 and can override the entire process of balancing indents and dedents that ordinarily takes place. A hypothetical @match.next capture would be analyzed in step 1 and would override step 1 only — applying a baseline indentation level.

The advantage of @match.next would be just like the advantage of @dedent.next: when it's operative, the cursor would start out at the right place as soon as you pressed Return.

I think this is a good idea, even if it reveals that I haven't made the capture names very intuitive. I'll try to get this into 1.121.

A grammar's queries aren't easily extensible

A modern Tree-sitter grammar is bundled with its own query files. Those files determine how the code gets highlighted, how it gets folded, how it gets indented, and which parts represent important symbols like functions. They're a powerful way of customizing Pulsar, but they themselves are not easy to customize from the outside. It took me about 30 minutes to figure out a simple snippet that @mibi88 could drop into their init.js to add a single query to the C grammar's indents.scm.

I tried to address this a bit in #1062; there's a new onDidLoadQueryFiles callback method that identifies when it's safe to read a grammar's raw query file text. I also made it so that setQueryForTest (a method I introduced to make testing easier, but which I've already repurposed for hacks and will probably end up renaming) automatically triggers a reload of queries, even for editors that are already open.

But there should probably be an easier way for one package to customize another package's queries. For instance, maybe there could be an API for accepting a path to a query file rather than a raw string; that would effectively say “add this extra query file to the list of files loaded for a grammar's highlightsQuery (or any other query).” If this code runs before the grammar is done loading, great; if not, then it can prompt a reload of the query files.

The hard part of query customization is that Tree-sitter parsers change, and those changes aren't always backward-compatible. A language package bundles a wasm file and a set of scm files and implicitly asserts that the query files match the parser; but customizations from third parties don't necessarily know which version of a parser is running. (We expose the “version” of a parser that we used to generate the wasm file, but only via an NPM-style “package spec” that typically refers to a GitHub repo and a SHA instead of a version number.) So if a package adds a customization that results in a Query object failing to compile, we might want to try the customization in isolation and, if it also fails to compile, report it to the user so that they can disable (or update) the offending package.

If the package wants to customize queries for a built-in package, then they can analyze the version of Pulsar itself and infer which query file would be valid. If it wants to customize queries for a community package, then they'd have to analyze the version number of the package itself, which is a bit trickier.

This is the hardest of the three, but I hope to arrive at an answer incrementally over time.

pulsar-edit / pulsar