Inject rules into syntax definitions

Problem description

Some syntax definitions embed a lot of other syntax definitions. Specifically, Markdown and fenced code blocks. The Markdown syntax definition is shipped with Sublime Text. The syntax test CI runner fails any sublime-syntax file that embeds a sublime-syntax file which is not part of the sublimehq/Packages repository. This is all well and good.

However, there is one tiny limitation with this - it prevents third party syntax definitions from being included in Markdown fenced code blocks.

Preferred solution

A way for a .sublime-syntax file to specify context "injection points", which could then be handled through additional meta data or something.

Concrete example. Markdown.sublime-syntax could contain a new top-level key called injection_templates, which would essentially contain something like:

code_fence:
   - parameters:
     - syntax_name_subscope
     - language_name_regex
     - embed_syntax_reference
   - match: |-
         (?x)
          {{fenced_code_block_start}}
          ({{language_name_regex}})
          {{fenced_code_block_trailing_infostring_characters}}
      captures:
        0: meta.code-fence.definition.begin.{{syntax_name_subscope}}.markdown-gfm
        2: punctuation.definition.raw.code-fence.begin.markdown
        5: constant.other.language-name.markdown
      embed: {{embed_syntax_reference}}
      embed_scope: markup.raw.code-fence.{{syntax_name_subscope}}.markdown-gfm
      escape: '{{code_fence_escape}}'
      escape_captures:
        0: meta.code-fence.definition.end.{{syntax_name_subscope}}.markdown-gfm
        1: punctuation.definition.raw.code-fence.end.markdown

(any variables referenced which are not parameters would come from the original syntax, in which the "injection templates" are defined.)

and when referenced from JavaScript.sublime-syntax with a new top-level key called inject:

  - into: scope:text.html.markdown#fenced-syntaxes
  - template: code_fence
  - parameters:
    syntax_name_subscope: javascript
    language_name_regex: (?i:javascript|js)
    embed_syntax_reference: scope:source.javascript

then ST would automatically resolve it by logically* appending to the fenced-syntaxes context in the Markdown.sublime-syntax file (*without actually altering the file ofc):

  - match: |-
       (?x)
        {{fenced_code_block_start}}
        ((?i:javascript|js))
        {{fenced_code_block_trailing_infostring_characters}}
    captures:
      0: meta.code-fence.definition.begin.javascript.markdown-gfm
      2: punctuation.definition.raw.code-fence.begin.markdown
      5: constant.other.language-name.markdown
    embed: scope:source.js
    embed_scope: markup.raw.code-fence.javascript.markdown-gfm
    escape: '{{code_fence_escape}}'
    escape_captures:
      0: meta.code-fence.definition.end.javascript.markdown-gfm
      1: punctuation.definition.raw.code-fence.end.markdown

I imagine the append operations could happen in the same lexicographical order that packages and syntaxes are currently loaded in. (In most cases, it shouldn't make much difference, the regex patterns injected should be unique enough not to conflict.)

If the syntax/context referenced in the into key doesn't exist, it could/should just be a warning rather than a failure. EDIT: it may also be nice if the template itself defines which context it should be injected into as opposed to leaving it up to the third party syntax referencing this template. Then, if the template or syntax doesn't exist, it would just be a warning.

Alternatives

Stacking multiple syntax extensions instead of overrides. i.e. Packages/JavaScript/Markdown.sublime-syntax containing:

scope: text.html.markdown
extends: self # meaning the same named/scoped `sublime-syntax` file from another package which does not contain `extends: self`
contexts:
  fenced_syntaxes:
    - meta_append: true
    - match: |-
         (?x)
          {{fenced_code_block_start}}
          ((?i:javascript|js))
          {{fenced_code_block_trailing_infostring_characters}}
      captures:
        0: meta.code-fence.definition.begin.javascript.markdown-gfm
        2: punctuation.definition.raw.code-fence.begin.markdown
        5: constant.other.language-name.markdown
      embed: scope:source.js
      embed_scope: markup.raw.code-fence.javascript.markdown-gfm
      escape: '{{code_fence_escape}}'
      escape_captures:
        0: meta.code-fence.definition.end.javascript.markdown-gfm
        1: punctuation.definition.raw.code-fence.end.markdown

i.e. the usual extends logic, but an explicit instruction to extend itself. But would be too confusing, I think. Plus, it would make it harder to maintain and change anything in the base syntax with all the copy/pasted code, so I prefer the template idea.

Or maybe something clever with parameterized contexts?

Users continue to create their own overrides of Markdown.sublime-syntax to explicitly include each additional syntax they have installed/want support for.

Additional Information

VSCode apparently has something similar, but I admittedly didn't pay much attention to how it works when drafting this proposal: https://code.visualstudio.com/api/language-extensions/syntax-highlight-guide

I think this would be a great extension. While I share concerns about dependencies, I think a generic approach might be more flexible and useful in the long term, especially when combined with more extended variable support (see: #3787) or parametrized contexts.

The provided example of fenced code blocks is a quite special one which targets fixed blocks of foreign syntax highlighting. It would still prevent more generic extensions to an existing syntax or at least require templates to be provided by the base syntax.

I like the quite generic way of how I can install a package and get some more features in an existing base syntax. Things like CriticMarkups (recently added to core Markdown) could be such kind of 3rd-party extensions. Another use case would be Markdown extensions, such as admonations etc.

I agree with extends: self being a bit confusing probably. I would have prefered base: as keyword for current inheritance support as this is what color schemes use for this kind of feature. So extends could have been used for this kind of extending the original syntax rather than creating a new derived one.

So maybe a global key such as injects_into: ... would be a possible alternative. Such a key would tell ST to extend the original base syntax with same set of features (meta_prepend/meta_append/...).

Voting for this. For my use case I'd like any syntax installed to be available for markdown code blocks without having to do anything special. But anything that makes it possible would do. edit: this thread: https://forum.sublimetext.com/t/custom-syntax-highlighting-in-markdown-code-blocks/68009 has a workaround for my use case.

My use case would be to allow Twig syntax highlighting in PHP heredoc.

There are tons of use cases an lots of benefit makeing users life easear when working with combined frontend backend frameworks.

Here's another attempt to describe a possible approach, based on current extend features using some real world examples.

Relational

In VS Code packages can easily specify patterns to be injected into existing base syntax definitions by specifying

{
  "scopeName": "<name>.injection",
  "injectionSelector": "L:<scope-selector>",
}

The base syntax and all its inherited variants benefit from that extension.

Frontend JS library AlpineJS for instance requires special highlighting to be added HTML tag attributes of all common frameworks such as Astro, Blade, Vue, etc.

In VS Code, installing an AlpineJS package is enough to add that syntax highlighting.

In Sublime Text it is only possible by dynamically creating syntax definitions via plugins like YAML Macros, all of which require manual or semi-automatic end user interaction.

To combine Astro and AlpineJS, end users are asked to manually create a HTML (Astro, AlpineJS) syntax definition:

%YAML 1.2
---
name: HTML (Astro, AlpineJS)
scope: text.html.astro.alpinejs
version: 2

extends:
  - Packages/AlpineJS/Syntaxes/HTML (AlpineJS).sublime-syntax
  - Packages/Astro/Syntaxes/HTML (Astro).sublime-syntax

Creation could partly be automated via plugins, but meta data to express dependencies between syntax definitions are missing.

Creating reliable support to inject patterns into bundled syntaxes like HTML or CSS seems rather tricky and hacky. A possible approach is out-lined at PR 3416. It requires creating API syntax aliases, which can be overwritten by plugins to add further syntaxes to extends: key.

So if we want to be able to inject AlpineJS into all templating syntaxes (Atro, Vue, ...) in a single shot, we'd need to refactor HTML package in following way:

rename HTML.sublime-syntax to e.g. HTML (Basic).sublime-syntax

add a dummy syntax

%YAML 1.2
---
name: HTML
scope: text.html
version: 2

extends:
 - HTML (Basic).sublime-syntax

a (3rd-party?) plugin would need to somehow identify and manage relationships of syntaxes and injections as well as overriding HTML.sublime-syntax to add further syntaxes to extends.
```
%YAML 1.2
---
name: HTML
scope: text.html
version: 2

extends:
 - HTML (Basic).sublime-syntax
 - Packages/AlpineJS/Syntaxes/HTML (AlpineJS).sublime-syntax
```

Suggestion

With Sublime Text supporting diamond inheritance, most of required infrastructure to provide a VS Code like DX/UX is already present.

What's missing is a mechanism to recursively maintain lists of sublime-syntax files to merge into their base syntax definitions:

Packages/HTML/HTML.sublime-syntax

Packages/AlpineJS/HTML (AlpineJS).sublime-syntax
Packages/HTMX/HTMX.sublime-syntax

Each time a base syntax or one of its children changes, load and merge them. Publish the result under base syntax name and scope (e.g. as HTML.sublime-syntax with text.html.basic scope).

Syntax definitions, intended to be injected could be identified by keyword injects_to.

%YAML 1.2
---
# Packages/AlpineJS/HTML (AlpineJS).sublime-syntax
name: HTML (AlpineJS)
scope: text.html.injecting.alpinejs
version: 2

injects_to:               # add contexts of this file
  - HTML.sublime-syntax   #  to HTML.sublime-syntax
  - XML.sublime-syntax    # and XML.sublime-syntax
                          #   XML may a bad example as both would require same ancestor
                          #   and context structure!

contexts:
 tag-attributes:
   - meta_prepend: true
   - include: tag-alpinejs-attributes

 tag-alpinejs-attributes:
   ...

As a result, patterns from tag-alpinejs-attributes should be automatically available in all syntaxes extending HTML.sublime-syntax. These are ASP, Astro, ERB, JSP, PHP, Vue, ... .

Other use cases

Markdown

Enable 3rd-party package to add fenced code blocks

%YAML 1.2
---
# Packages/Astro/Injections/Markdown (Astro).sublime-syntax
scope: text.html.markdown.injecting.astro
version: 2

injects_to: Packages/Markdown/Markdown.sublime-syntax

contexts:
  fenced-syntaxes:
    - meta_append: true
    - include: fenced-astro

  fenced-astro:
    - match: |-
         (?x)
          {{fenced_code_block_start}}
          ((?i:astro))
          {{fenced_code_block_trailing_infostring_characters}}
      captures:
        0: meta.code-fence.definition.begin.markdown-gfm
        2: punctuation.definition.raw.code-fence.begin.markdown
        5: constant.other.language-name.markdown
      embed: scope:source.astro
      embed_scope: markup.raw.code-fence.markdown-gfm source.astro
      escape: '{{fenced_code_block_escape}}'
      escape_captures:
        0: meta.code-fence.definition.end.markdown-gfm
        1: punctuation.definition.raw.code-fence.end.markdown

This would heavily reduce current boilerplate to support syntaxes in fenced code blocks.

see: https://github.com/SublimeText-Markdown/MarkdownEditing/blob/1df39dc1d4e6998455d87d3f54204370c908c9e2/syntaxes/Markdown.sublime-syntax#L1108-L2881

PostCSS and TailwindCSS

Design Tailwind CSS to inject at-rule context to PostCSS, so they are automatically available in style tags of Astro and Vue Components both of which extend HTML and include PostCSS via

  style-lang-decider:
    - match: (?i)(?=postcss{{unquoted_attribute_break}}|'postcss'|"postcss")
      set:
        - style-postcss
        - tag-generic-attribute-meta
        - tag-generic-attribute-value

  style-postcss:
    - meta_scope: meta.tag.style.begin.html
    - match: '>'
      scope: punctuation.definition.tag.end.html
      set: style-postcss-content
    - include: style-common

  style-postcss-content:
    - match: '{{style_content_begin}}'
      captures:
        1: comment.block.html punctuation.definition.comment.begin.html
      pop: 1
      embed: scope:source.postcss
      embed_scope: source.postcss.embedded.html
      escape: '{{style_content_end}}'
      escape_captures:
        1: source.postcss.embedded.html
        2: comment.block.html punctuation.definition.comment.end.html
        3: source.postcss.embedded.html
        4: comment.block.html punctuation.definition.comment.end.html

Tailwind CSS would look like

%YAML 1.2
---
scope: source.postcss.injections.tailwind
version: 2

injects_to: PostCSS.sublime-syntax

contexts:

  at-other:
    - meta_prepend: true
    - include: tailwind-at-config
    - include: tailwind-at-responsive
    - include: tailwind-at-tailwind
    - include: tailwind-at-variants
    - include: tailwind-at-screen
  ...

Limitations

It is for sure possible packages injecting conflicting patterns, which may augment each other. But that's the case in VS Code as well.

sublimehq / sublime_text