sublimehq / Packages

Syntax highlighting files shipped with Sublime Text and Sublime Merge
https://sublimetext.com
Other
2.95k stars 587 forks source link

[RFC] Interpolation of regular expressions #2982

Open deathaxe opened 3 years ago

deathaxe commented 3 years ago

Prelude

Several syntax definitions implement string interpolation by clearing string scope.

The goal is to enable simple color schemes to properly highlight interpolated variables or expressions as well as embedded source code without special treatment.

Embedded syntaxes, such as JavaScript or CSS in HTML tag attributes look like:

<p style="color: darkred" onclick="func('Hello World')">
//       ^ meta.string string.quoted
//        ^^^^^^^^^^^^^^ meta.string meta.interpolation source.css.embedded
//                      ^ meta.string string.quoted
//                                ^ meta.string string.quoted
//                                 ^^^^^^^^^^^^^^^^^^^ meta.string meta.interpolation source.js.embedded
//                                                    ^ meta.string string.quoted

Note: The string scope is cleared between quotation marks.

A quoted string with variable interpolation looks like:

    "Vars: $var , ${var} , {$expr} , $array[10]"
//  ^^^^^^^ meta.string string.quoted - meta.interpolation
//         ^^^^ meta.string meta.interpolation - string
//             ^^^ meta.string string.quoted - meta.interpolation
//                ^^^^^^ meta.string meta.interpolation - string
//                      ^^^ meta.string string.quoted - meta.interpolation
//                         ^^^^^^^ meta.string meta.interpolation - string
//                                ^^^ meta.string string.quoted - meta.interpolation
//                                   ^^^^^^^^^^ meta.string meta.interpolation - string
//                                             ^ meta.string string.quoted - meta.interpolation

Note: The string scope is cleared whenever a $... interpolation is consumed.

Quoted regular expression strings currently mix both use cases above. The whole string is scoped meta.string string.quoted source.regexp in Python and PHP for instance.

    "/Vars: $var , ${var} , {$expr} , $array[10]/m"
//  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ meta.string string.quoted. source.regexp

The issue

The examples basically reveal two different kinds of issues:

  1. Current interpolation scheme is used for both
    • interpolating variables into strings and
    • embedding foreign syntax into string (which could additionally need to highlight interpolated variables)
  2. Regular expressions are sometimes a combination of string and source. The way scopes are stacked current interpolation scheme requires two scopes to be cleared.

Issue 1 can probably be solved by using meta.embedded vs. meta.interpolation.

Issue 2 probably requires some discussion about how to solve the problem.

Proposal

As source.regexp is only applied if regular expressions are implemented in external syntax definitions (see: Python, PHP, Perl) but not if they are part of the syntax itself (e.g.: Bash), a first step would probably be to find a common transparent scoping scheme for both of them.

The primary goal should/would be to apply the same "string-interpolation" contexts used for normal quoted strings to avoid duplicating numerous contexts.

This ends up in removing source.regexp.

It would enable normal interpolation contexts to be used, which currently clear 1 scope to remove string.

A new question arises though: How to scope regular expressions then?

Some syntaxes apply string.regexp to regular expressions, but quoted strings already use string.quoted.

Color schemes might want to treat regular expressions special, as they require code highlighting as normal source code.

We could use meta.string string.regexp.quoted or meta.string.regexp string.quoted.

Any ideas?

keith-hall commented 3 years ago

I would vote for meta.string.regexp string.quoted.single style, partially because most color schemes already target string.quoted, there's less combinatorial explosion of scopes for color schemes to target, and partially because I think specializing the meta scope just makes more sense here.

jwortmann commented 3 years ago

I'd really welcome a common scoping guideline for regexp strings.

I think the vast majority of color schemes target the more general string scope, instead of only string.quoted - and for those which don't, I would classify potential problems from that as a color scheme issue, because string is one of the scopes in the recommended minimal scope coverage from the scope naming guidelines.

In general, like @keith-hall I would prefer the meta.string.regexp string.quoted scheme too, because it makes more sense to have the "regexp" property of a string separated from the quoting style. But I think it might cause noticeable problems for backwards compatibility. string.regexp is already explicitly specified in the scope naming guideline as the scope name to use:

Regular expression literals should use:

  • string.regexp

And there are a lot of color schemes which target this scope. I just checked a few color schemes and could find that scope in all of them, for example Solarized, Dracula, Base16, Nord.

deathaxe commented 3 years ago

I found following string.regexp scopes in this repo.

scope comment
string.regexp.clojure "pattern"
string.regexp.groovy /pattern/.
string.regexp.javascript /pattern/.
string.regexp.ruby /pattern/, {...} ...
string.regexp.perl /pattern/, m{pattern}, ...
string.regexp.modr.sql %r{ }
string.regexp.sql /pattern/
string.regexp.tcl depends on command

Finally Clojure seems the only effected with regards to string.quoted.double when we talk about possible highlighting changes.

PHP and Python don't use string.regexp to highlight regular expressions in quoted strings. They already give string.quoted precedence.

I am uncertain about all those custom perl style patterns at the moment. Perl knows about q/literal/ or s/pattern/. It only scopes the tokens between / as string.unquoted vs. string.regexp, while q and s are functions.

Maybe something like this is the way to go for those kinds of string constructs without too heavy impact.

jwortmann commented 3 years ago

Finally Clojure seems the only effected with regards to string.quoted.double when we talk about possible highlighting changes.

Oh, I was under the assumption that one goal was to use a common scope for all regular expressions, regardless if they are delimited by e.g. r"pattern" or /pattern/. That would make it easier for color schemes to treat all regular expressions in a certain way. But if Closure is the only syntax in question for a change (in addition to removing source.regexp in general), then I see no problem to adjust string.regexp.clojure to string.quoted.double.closure. That would mean to have meta.string.regexp only in case when the regexp is not of type /pattern/, if I understand it correctly?

We might also want to distinguish whether the special characters/elements in a regexp are scoped, or not. From personal experience with my color scheme, I tried to remove the usual string color in that case, to allow the regexp characters/elements to be highlighted like code. But if there is no special scoping within a regexp string, I'd like to keep the string highlighting color. IIRC, this was often possible via the source.regexp scope, but I probably could easily change it if meta.string.regexp will be used instead then.

deathaxe commented 3 years ago

The primary reason for this RFC is an attempt to properly support string interpolation in PHP in a way we do it in other syntaxes already, by clearing string scope an keeping just meta.string so interpolated variables/expressions are highlighted as code without special treatment by color schemes.

The main issue I am faced with is those quoted regular expressions using meta.string string.quoted source.regexp, which means interpolation contexts must clear 2 scopes to get rid of string.

As a result we'd always need special interpolation contexts for regular expressions, because we only need to clear 1 scope in normal quoted strings.

While I already found a quite practical solution to achieve it without duplicating any interpolation pattern, the general idea just was to find a scope scheme which only uses two scopes for patterns as well, to avoid possible issues in other syntaxes.

Example of current approach:

$pattern = "/^text[0-9] ${interpolation}/m"
           ^^^^^^^^^^^^^ meta.string string.quoted source.regexp - meta.interpolation
                        ^^^^^^^^^^^^^^^^ meta.string meta.interpolation source.regexp - string
                                        ^^^ meta.string string.quoted source.regexp - meta.interpolation

If this ends up in a common "style guide" for all patterns, I'd be ok with as well. I am not focused on a certain approach/solution/idea.

We also have your https://github.com/sublimehq/Packages/issues/1942 about how to treat/scope regexp content itself.

I agree with the "desirable result" of being able to write simple color scheme rules, which apply well and equally to all syntaxes.

I just was not keen enough to put this large scope into this RFC in the first place.