sublimehq / sublime_text

Issue tracker for Sublime Text
https://www.sublimetext.com
801 stars 35 forks source link

Inject rules into syntax definitions #5004

Open keith-hall opened 2 years ago

keith-hall commented 2 years ago

Problem description

Some syntax definitions embed a lot of other syntax definitions. Specifically, Markdown and fenced code blocks. The Markdown syntax definition is shipped with Sublime Text. The syntax test CI runner fails any sublime-syntax file that embeds a sublime-syntax file which is not part of the sublimehq/Packages repository. This is all well and good.

However, there is one tiny limitation with this - it prevents third party syntax definitions from being included in Markdown fenced code blocks.

Preferred solution

A way for a .sublime-syntax file to specify context "injection points", which could then be handled through additional meta data or something.

Concrete example. Markdown.sublime-syntax could contain a new top-level key called injection_templates, which would essentially contain something like:

code_fence:
   - parameters:
     - syntax_name_subscope
     - language_name_regex
     - embed_syntax_reference
   - match: |-
         (?x)
          {{fenced_code_block_start}}
          ({{language_name_regex}})
          {{fenced_code_block_trailing_infostring_characters}}
      captures:
        0: meta.code-fence.definition.begin.{{syntax_name_subscope}}.markdown-gfm
        2: punctuation.definition.raw.code-fence.begin.markdown
        5: constant.other.language-name.markdown
      embed: {{embed_syntax_reference}}
      embed_scope: markup.raw.code-fence.{{syntax_name_subscope}}.markdown-gfm
      escape: '{{code_fence_escape}}'
      escape_captures:
        0: meta.code-fence.definition.end.{{syntax_name_subscope}}.markdown-gfm
        1: punctuation.definition.raw.code-fence.end.markdown

(any variables referenced which are not parameters would come from the original syntax, in which the "injection templates" are defined.)

and when referenced from JavaScript.sublime-syntax with a new top-level key called inject:

  - into: scope:text.html.markdown#fenced-syntaxes
  - template: code_fence
  - parameters:
    syntax_name_subscope: javascript
    language_name_regex: (?i:javascript|js)
    embed_syntax_reference: scope:source.javascript

then ST would automatically resolve it by logically* appending to the fenced-syntaxes context in the Markdown.sublime-syntax file (*without actually altering the file ofc):

  - match: |-
       (?x)
        {{fenced_code_block_start}}
        ((?i:javascript|js))
        {{fenced_code_block_trailing_infostring_characters}}
    captures:
      0: meta.code-fence.definition.begin.javascript.markdown-gfm
      2: punctuation.definition.raw.code-fence.begin.markdown
      5: constant.other.language-name.markdown
    embed: scope:source.js
    embed_scope: markup.raw.code-fence.javascript.markdown-gfm
    escape: '{{code_fence_escape}}'
    escape_captures:
      0: meta.code-fence.definition.end.javascript.markdown-gfm
      1: punctuation.definition.raw.code-fence.end.markdown

I imagine the append operations could happen in the same lexicographical order that packages and syntaxes are currently loaded in. (In most cases, it shouldn't make much difference, the regex patterns injected should be unique enough not to conflict.)

If the syntax/context referenced in the into key doesn't exist, it could/should just be a warning rather than a failure. EDIT: it may also be nice if the template itself defines which context it should be injected into as opposed to leaving it up to the third party syntax referencing this template. Then, if the template or syntax doesn't exist, it would just be a warning.

Alternatives

Stacking multiple syntax extensions instead of overrides. i.e. Packages/JavaScript/Markdown.sublime-syntax containing:

scope: text.html.markdown
extends: self # meaning the same named/scoped `sublime-syntax` file from another package which does not contain `extends: self`
contexts:
  fenced_syntaxes:
    - meta_append: true
    - match: |-
         (?x)
          {{fenced_code_block_start}}
          ((?i:javascript|js))
          {{fenced_code_block_trailing_infostring_characters}}
      captures:
        0: meta.code-fence.definition.begin.javascript.markdown-gfm
        2: punctuation.definition.raw.code-fence.begin.markdown
        5: constant.other.language-name.markdown
      embed: scope:source.js
      embed_scope: markup.raw.code-fence.javascript.markdown-gfm
      escape: '{{code_fence_escape}}'
      escape_captures:
        0: meta.code-fence.definition.end.javascript.markdown-gfm
        1: punctuation.definition.raw.code-fence.end.markdown

i.e. the usual extends logic, but an explicit instruction to extend itself. But would be too confusing, I think. Plus, it would make it harder to maintain and change anything in the base syntax with all the copy/pasted code, so I prefer the template idea.


Or maybe something clever with parameterized contexts?


Users continue to create their own overrides of Markdown.sublime-syntax to explicitly include each additional syntax they have installed/want support for.

Additional Information

VSCode apparently has something similar, but I admittedly didn't pay much attention to how it works when drafting this proposal: https://code.visualstudio.com/api/language-extensions/syntax-highlight-guide

deathaxe commented 2 years ago

I think this would be a great extension. While I share concerns about dependencies, I think a generic approach might be more flexible and useful in the long term, especially when combined with more extended variable support (see: #3787) or parametrized contexts.

The provided example of fenced code blocks is a quite special one which targets fixed blocks of foreign syntax highlighting. It would still prevent more generic extensions to an existing syntax or at least require templates to be provided by the base syntax.

I like the quite generic way of how I can install a package and get some more features in an existing base syntax. Things like CriticMarkups (recently added to core Markdown) could be such kind of 3rd-party extensions. Another use case would be Markdown extensions, such as admonations etc.

I agree with extends: self being a bit confusing probably. I would have prefered base: as keyword for current inheritance support as this is what color schemes use for this kind of feature. So extends could have been used for this kind of extending the original syntax rather than creating a new derived one.

So maybe a global key such as injects_into: ... would be a possible alternative. Such a key would tell ST to extend the original base syntax with same set of features (meta_prepend/meta_append/...).

vmtype commented 1 year ago

Voting for this. For my use case I'd like any syntax installed to be available for markdown code blocks without having to do anything special. But anything that makes it possible would do. edit: this thread: https://forum.sublimetext.com/t/custom-syntax-highlighting-in-markdown-code-blocks/68009 has a workaround for my use case.

willrowe commented 3 months ago

My use case would be to allow Twig syntax highlighting in PHP heredoc.