pulsar-edit / pulsar

A Community-led Hyper-Hackable Text Editor
https://pulsar-edit.dev
Other
3.33k stars 140 forks source link

Add a `TextEditor` method for retrieving comment delimiters… #970

Closed savetheclocktower closed 7 months ago

savetheclocktower commented 7 months ago

…for a given buffer position, plus added thorough delimiter data to each built-in language via the config system.

Background

There are four snippet variables in VSCode's snippets implementation that we don’t yet support. One is $UUID, which will take care of itself when we upgrade Electron and get a UUID-generating function for free in Node’s standard library.

The other three are

because there’s never been a way to look up this information, as odd as it may seem.

When a grammar author makes a package for a given language, they can specify comment delimiters in several places; the most common is the editor.commentStart/editor.commentEnd scope-specific config properties that we inherited from TextMate. But TextMate only cared about comment delimiters to the extent that it needed that data to comment/uncomment blocks of code with Cmd+/, and that’s the only thing we’ve ever used those settings for either.

Back in the days of the original Tree-sitter migration, someone decided that comment delimiter data should live on the grammar definition file. (My guess is that they envisioned a future without a TextMate-style scope cascade, but we don’t live in that dystopia.) This isn’t a huge problem, but those definitions were no more fleshed-out than the config values in the old system. For example, you might see this in the JavaScript grammar file:

# ...

comments:
  line: '// '

…because that’s all you needed to set in order to get Editor: Toggle Line Comments to work.

So this appears to be the first use case we’ve had for needing to know a specific sort of comment delimiter. This has been on my to-do list for ages and I’m finally getting around to fixing it.

Why is this useful for snippets?

I’ve often used the example of a snippet that can be used to type a banner comment:

'Banner comment':
  'prefix': 'banner'
  'body': '// $1\n// ${1/./=/g}'

This snippet combines an ordinary tab stop (the kind you type into) with a second tab stop that transforms your input, replacing every character in your input with an = to generate a fancy double-underline for your very important comment heading.

https://github.com/pulsar-edit/pulsar/assets/3450/7ba7077d-06fe-4391-9858-f1e75a99b86c

But it only works in C-style languages because we’ve hard-coded the //s. Now we can make it isomorphic!

'banner':
  'prefix': 'banner'
  'body': '$LINE_COMMENT $1\n$LINE_COMMENT ${1/./=/g}'

Now I can bring that same snippet over to an SCM file:

https://github.com/pulsar-edit/pulsar/assets/3450/dee0c8a9-7701-4c38-83f8-6ed3aa1946b8

Is this a huge, life-changing feature? No. Clearly I’m one of eight people on earth who uses snippets. But it’s certainly worth doing, and until now was only held up by the fact that it would take an annoying chore to fix it.

Description of the Change

I’ve added a new method to TextEditor called getCommentDelimitersForBufferPosition. At a given position, it will look up the full set of comment delimiters — not just the incomplete set that exists to support Text Editor: Toggle Line Comments. If a language doesn’t have the right setting present, Pulsar will do its best to return useful (if incomplete) data.

The returned data structure would look like this if you were looking up JavaScript comment delimiters:

{
  line: '//',
  block: ['/*', '*/']
}

Or this if you’re looking up Python comment delimiters:

{
  line: '#',
  block: undefined
}

Or this if you’re looking up CSS comment delimiters:

{
  line: undefined,
  block: ['/*', '*/']
}

The implementation details of how this is done vary per language mode:

This will hit the same method on the language mode — commentStringsForPosition — that was already being used to figure out which delimiter to use for Text Editor: Toggle Line Comments. That behavior has not changed; there’s simply a new property on the returned object now.

I preferred to expand the scope of a non-public-facing language mode method instead of defining one new method on each of these three language mode classes. Each caller of commentStringsForPosition knows which data it cares about; the code path that's hit for Text Editor: Toggle Line Comments will consult the existing properies on the returned object, whereas the code path that's hit for getCommentDelimitersForBufferPosition will call the same method and read only the new returned property.

This makes sense to me because even the alternative — implementing two different methods for these two code paths — would've involved a lot of shared code. I've talked about how we use editor.commentStart and editor.commentEnd to fill in delimiter data when the more comprehensive editor.commentDelimiters property isn't present, but we can also go the other way. If a newer language defines editor.commentDelimiters but forgets to define editor.commentStart and editor.commentEnd, we can use our brand-new data structure to fill in the properties that Text Editor: Toggle Line Comments expects.

Nothing consumes the new getCommentDelimitersForBufferPosition method yet, though of course I’ll make sure that the snippets package is the first consumer as soon as this lands so that I can support those three snippet variables and make my life slightly better.

Alternate Designs

Doing this required revisiting the question of where this data should live. It might seem intuitive for comment metadata to live on the grammar itself, and that’ll work 98% of the time. But the other 2% mainly involves scenarios in which the right delimiters vary depending on which part of the file you’re in.

JSX is the classic example:

https://github.com/pulsar-edit/pulsar/assets/3450/3aa6b987-2a0f-4eda-a48c-3e67739bc649

Legacy Tree-sitter never handled this right because it can only determine delimiters on a per-language basis — nothing more nuanced than that. The scope system can look up editor.commentStart and get a different value when the cursor is inside of a JSX block than it would if the cursor were at the top of a JavaScript file.

And, though it’s an even more obscure use case… there should be some way for a user to opt into a different delimiter for TextEditor: Toggle Line Comments. If we use the config system, a user can override it; if we don’t, they’re stuck with it.

Hence why WASMTreeSitterLanguageMode::commentStringsForPosition checks config first, and grammar data second. I think we're making the right choice here.

Possible Drawbacks

There are probably lots of edge cases in this behavior, though I’ve done my best to account for them. Ultimately, the stakes are low here because this is not a feature that the world has been clamoring for.

Verification Process

There are new tests in WASMTreeSitterLanguageMode and TextEditor. They should pass. The code path for toggling line comments has been touched quite a bit, so all those specs should also pass.

I mean, y’know, they’re tests. They should pass. All our tests should pass.

Release Notes