zed-industries / extensions

Extensions for the Zed editor
572 stars 229 forks source link

Extensions and language injections #484

Open nwhetsell opened 2 months ago

nwhetsell commented 2 months ago

Thank you for creating Zed and making it open source; apologies if this isn’t the right place for this kind of issue.

I’m trying to write a language extension for LilyPond. LilyPond files can contain Scheme code; it’s conceptually similar to how HTML files can contain JavaScript code. My understanding is that in Tree-sitter, this sort of thing is handled using injections.

I can’t seem to get language injections to work in a Zed extension, and I haven’t been able to find an extension with working injections. For example, the LaTeX extension looks like it should be highlighting the contents of certain environments (like minted) using injections, but this doesn’t appear to be happening:

Screenshot 2024-04-12 at 6 51 37 AM

(Note that the code in lines 7–8 isn’t highlighted as TOML; I have the TOML extension installed.)

Are language injections achievable in Zed extensions?

gopeter commented 2 months ago

Funny, I had the same issue an hour ago while trying to add Twig. Looking at other extensions or the official ones, language injections should work. But I don't get it working in my Twig extension:

That's my injections.scm:

((content) @injection.content
  (#set! injection.language "html")
  (#set! injection.combined))

And that's how tree-sitter has parsed the file.

image

So, from my understanding, this piece of "content" should be formatted as HTML, but it isn't. Any ideas? Looking at the PHP or ERB extensions this works quite well.

gopeter commented 2 months ago

Oh, found it! It has to be:

((content) @content
 (#set! "language" "html")
 (#set! "combined"))

There's a discussion about the format here: https://github.com/zed-industries/zed/pull/9654

nwhetsell commented 2 months ago

@gopeter Thank you!

Unfortunately, it looks like there may be another issue in Zed that prevents this from working in my particular case.

Many (I’d bet most) Tree-sitter grammars for languages that can contain other languages use an external scanner to parse the other language as one syntax tree node. For example, the HTML grammar and this Twig grammar both do this. Using an external scanner is relatively straightforward (but often still tricky) when the other language ends at a well-defined delimiter (like </script>), but this isn’t the case for Scheme embedded in LilyPond: embedded Scheme code ends at the end of the embedded datum, like a number’s last digit or a list’s closing parenthesis. The upshot is that to parse Scheme embedded in LilyPond, you need a complete Scheme parser.

Rather than write an external scanner that functions as a Scheme parser, I just use Tree-sitter to parse Scheme embedded in LilyPond. But this means embedded Scheme isn’t contained in one syntax tree node; there’s a complete Scheme syntax tree. The issue is that it looks like Zed applies an injected grammar only when a syntax tree node has no child nodes.

Here’s a screenshot of a LilyPond file with embedded Scheme; note that the embedded Scheme isn’t syntax highlighted, and the syntax tree for the embedded Scheme has child nodes:

Screenshot 2024-04-14 at 7 47 09 AM

Here’s a screenshot of a LilyPond Scheme file with embedded LilyPond. The embedded LilyPond is syntax highlighted, and the syntax tree for the embedded LilyPond doesn’t have child nodes:

Screenshot 2024-04-14 at 7 51 04 AM

It looks like Tree-sitter has an injection.include-children property that’s intended to address this, although I’m not sure how widely implemented it is. Nova, for example, doesn’t appear to support it, although Nova includes child nodes when parsing an injected document anyway:

Screenshot 2024-04-14 at 7 55 39 AM