Punctuation doesn't have its own class

tree-sitter / tree-sitter-python

Python grammar for tree-sitter

MIT License

379 stars 138 forks source link

Punctuation doesn't have its own class #25

Closed ambv closed 6 years ago

ambv commented 6 years ago

By punctuation I essentially mean brackets, dots, commas, colons. This makes it hard to color them differently from regular text.

Example where coloring punctuation would make text easier to read:

maxbrunsfeld commented 6 years ago

The parser already handles all of the type annotation syntax that I'm aware of. For example, the code snippet that @ambv posted above would parse like this:

(module [0, 0] - [4, 0]
  (function_definition [0, 0] - [4, 0]
    (identifier [0, 4] - [0, 25])
    (parameters [0, 25] - [2, 1]
      (typed_parameter [1, 4] - [1, 14]
        (identifier [1, 4] - [1, 8])
        (type [1, 10] - [1, 14]
          (identifier [1, 10] - [1, 14])))
      (typed_default_parameter [1, 16] - [1, 34]
        (identifier [1, 16] - [1, 20])
        (type [1, 22] - [1, 26]
          (identifier [1, 22] - [1, 26]))
        (false [1, 29] - [1, 34]))
      (typed_default_parameter [1, 36] - [1, 65]
        (identifier [1, 36] - [1, 40])
        (type [1, 42] - [1, 60]
          (subscript [1, 42] - [1, 60]
            (identifier [1, 42] - [1, 52])
            (identifier [1, 53] - [1, 59])))
        (tuple [1, 63] - [1, 65])))
    (type [2, 5] - [2, 19]
      (subscript [2, 5] - [2, 19]
        (identifier [2, 5] - [2, 13])
        (identifier [2, 14] - [2, 18])))
    (pass_statement [3, 4] - [3, 8])))

kevinastone commented 6 years ago

(sorry, deleted my comment since I misunderstood)

If I understand now, we just need named rules for different parts to map to scopes in the language grammar?

kevinastone commented 6 years ago

I think @ambv wants to have separate nodes for the brackets so they can be scoped differently.

Something like changing:

subscript: $ => seq(
  $._primary_expression,
  '[',
  commaSep1(choice($._expression, $.slice)),
  optional(','),
  ']'
),

subscript: $ => seq(
  $._primary_expression,
  $.lookup
),

lookup: $ => seq(
  '[',
  $.lookup_exp,
  ']'
),

lookup_exp: $ => seq(
  commaSep1(choice($._expression, $.slice)),
  optional(','),
),

(and apologies, I just started looking into tree-sitter today for another language)

maxbrunsfeld commented 6 years ago

Yeah, no worries! This whole system is pretty new so there's not much documentation yet.

I actually don't think we need to make any changes to the parser for this issue; we just need to configure certain tokens like : and = to be highlighted in Atom. That configuration lives here. We probably just need to add some more lines similar to these, describing what classes we want to apply to those tokens.

In that scopes object, the keys are CSS selectors that select nodes in the syntax tree, and the values represent the list of classes to apply to those nodes for syntax highlighting. The syntax for referring to anonymous tokens (ones like : and = that don't have names in the grammar) is to surround them with double quotes.

kevinastone commented 6 years ago

Gotcha. So something like?

scopes:
  'subscript > "["': 'punctuation.definition.arguments.begin.python'
  'subscript > "]"': 'punctuation.definition.arguments.end.python'

ambv commented 6 years ago

I addressed this by naming the anonymous tokens in tree-sitter-python.cson in language-python. See my pull request there.