sublimehq / Packages

Syntax highlighting files shipped with Sublime Text and Sublime Merge
https://sublimetext.com
Other
2.95k stars 586 forks source link

[RFC] Use punctuation.definition for brackets defining collection literals rather than punctuation.section #2852

Open Thom1729 opened 3 years ago

Thom1729 commented 3 years ago

Motivating example in JavaScript:

// Block
    { }
//  ^ punctuation.section.block.begin
//    ^ punctuation.section.block.end

// Object literal
   +{ }
//  ^ punctuation.definition.mapping.begin
//    ^ punctuation.definition.mapping.end

Overview

For brevity's sake, by “brackets” I mean curly braces, square brackets, parentheses, and other paired punctuation characters as appropriate.

Many languages use the same brackets to denote both sections of code (e.g. parenthesized expressions) and literal collections (e.g. mappings or sequences). Depending on the context and/or the text between the delimiters, the delimiters may represent very different syntactic constructs.

The current practice is to scope those brackets as punctuation.section.* in either case. This proposal suggests using punctuation.definition instead for collection literals, while keeping punctuation.section for all other purposes.

Considerations

Scope naming guidelines.

The scope naming guidelines are — somewhat surprisingly — silent on the issue.

They do say that “Sections of code delineated by” brackets should use certain meta scopes, and the brackets should use punctuation.section.<section type>.begin|end, where the section type might be either the bracket type (braces, parens, or brackets) or something semantic (block or group). They also specify punctuation.section.interpolation.begin|end where appropriate.

However, the guidelines do not mention collection literals such as mappings, lists, or tuples. They might be considered “Sections of code delineated by” brackets, but I think that what the guidelines had in mind were code blocks, parenthesized expressions, and the like. The given semantic scopes are block and group, not mapping, sequence, or tuple. Moreover, the scope naming guidelines have always been rather C-centric, and C has no true collection literals. C does have array initializers — most commonly strings.

The scope naming guidelines specify punctuation.definition.string.begin|end for string delimiters. This seems to be the only explicit guideline for punctuation defining literals. By analogy, brackets defining (e.g.) a mapping might be punctuation.definition.mapping.

I interpret the scope naming guidelines to be compatible with either punctuation.section or punctuation.definition for collection literals.

Ergonomics

Some color schemes, including Mariana, color punctuation.definition differently from punctuation.section. In languages with both scopes, this could be a helpful distinction. JavaScript is a perfect example here — curly braces and square brackets are both “overloaded” and can refer to either collection literals or other constructs depending on the syntactic context. A missing semicolon can often cause one to be interpreted as the other, leading to a bug. Highlighting brackets depending on their syntactic purpose would make the mistake legible at a glance. (See e.g. https://github.com/sublimehq/Packages/pull/1551.)

On the other hand, every change is a change, and people don't always like change. It may be that using punctuation.definition would make some code less legible. Examples of this would be welcome.

Established usage

Established usage is clearly on the side of punctuation.section. The specifics vary between syntaxes. For instance, the JavaScript syntax scopes object literal brackets punctuation.section.block (which is listed in the guidelines, but arguably incorrect for this construct), whereas JSON uses punctuation.section.mapping (which is not in the guidelines). Python uses punctuation.section.mapping, punctuation.section.set, or punctuation.section.mapping-or-set. (This latter should probably be eliminated using branching.)

Even if we stick with punctuation.section, I think we should standardize on a single subscope or set of subscopes.

Implementation

In some languages, like JavaScript, using punctuation.definition for collection literals would be easy. (In JavaScript's case, this is because the syntax already has to make the distinction internally or everything would break.) In other languages, it might be more difficult. Lisp and Go have been suggested as languages for which the implementation might be difficult; examples would be welcome.

Alternatives

If punctuation.definition is too large a change, we could instead standardize on a consistent set of punctuation.section.* scopes, such as punctuation.section.sequence. This would also allow color schemes to target collection delimiters.

mitranim commented 3 years ago

TLDR: I'm in favor of keeping delimiter scopes simple.

Various downsides:


Semantics

The proposal seems to be about delimiters of literal data structures: lists, structs, maps, dicts, sets, plain objects, etc.

At a certain level, "data literals" are always syntactic shortcuts/aliases for function calls, sometimes with named arguments.

JS:

new Array(10, 20, 30) ≡ [10, 20, 30]

Python:

dict(one = 10, two = 20) ≡ {'one': 10, 'two': 20}

Swift:

struct A {
  let one: Int
  let two: Int
}

A(one: 10, two: 20)

[[[10]], [[20]]]

[10: 20]

Go:

type A struct {
  One int
  Two int
}

A{One: 10, Two: 20}

[][][]int{{{10}, {20}}}

map[int]int{10: 20}

Compare the Go and Swift structs. Swift supports named arguments. For structs, it auto-defines a constructor (hidden init() method) with named arguments matching the field names, and thus avoids special syntax. Go structs are exactly the same: function calls with named arguments.

So before it gets anywhere, the proposal ought to decide how to handle function call delimiters, and/or propose a strong reason why data delimiters would be treated differently.


Lisp

Lisp exemplifies a language with no meaningful way to differentiate "section" vs "definition" delimiters. All Lisp code is a literal data structure. By default, it's evaluated as code. If quoted, it's evaluated as data.

(print "hello world")
; "hello world"

(print '(print "hello world"))
; (PRINT "hello world")

Parens denoting a "block" also always denote a list. Consider the following:

'(
  (
    (let
      (
        (one 10)
        (two 20)
      )
      (print one)
      (print two)
    )
  )
)

In normal code, let creates a block: a sub-scope with some inner variables. In this example, one of the enclosing forms was quoted, preventing such evaluation. The let form might still get evaluated, possibly after getting modified! It exists in a superposition, neither definitively a block, nor merely a data structure.


Go

In Go, the following cases are pertinent:

The Sublime implementation already detects different uses of []; any existing edge cases might be handleable with branching.

Handling {} seems much trickier. Go allows "bare" {} for some data literals:

type A = map[int]map[int][][]int

A{10: {20: {{30, 40}}}}

Whether the type before {} can be elided depends on the specific type. Currently it's allowed for nested non-structs, but never for structs. The Go parser might be able to disambiguate this without relying on type information. Supporting this in Sublime might require the syntax to differentiate expression vs. statement context, significantly complicating the implementation. I'd like to avoid that.