ikatyang / tree-sitter-yaml

YAML grammar for tree-sitter
https://ikatyang.github.io/tree-sitter-yaml/
MIT License
94 stars 38 forks source link

feat(queries): add highlight queries #41

Open jimeh opened 1 year ago

jimeh commented 1 year ago

Adds a set of queries for syntax highlighting yaml, similar to other tree-sitter grammar projects.

This is the same queries from my PR against Emacs' tree-sitter-langs project: https://github.com/emacs-tree-sitter/tree-sitter-langs/pull/134

Screenshot

Here's how syntax highlighting turns out in Emacs, using the doom-vibrant theme:

Screen-Shot-2022-11-24-01-06-01 42

lukepistrol commented 1 year ago

Would love this getting merged!

lcrownover commented 1 year ago

Would this also support highlighting of variable interpolation? Ansible and Helm are two systems off the top of my head that use this and I'd love to be able to color the interpolation differently.

For example:

"My name is {{ name }}"

Where My name is would be normal string, {{ and }} would be some token, and name would be another token type, allowing us to color those tokens differently.

Currently:

- name: check if the correct loki version is downloaded
  ansible.builtin.stat:
    path: "/opt/loki/{{loki_server_version}}/loki-linux-amd64.zip"
  register: loki_server_check_downloaded
  changed_when: false

TSPlayground:

      block_sequence_item [9, 0] - [13, 21]
        block_node [9, 2] - [13, 21]
          block_mapping [9, 2] - [13, 21]
            block_mapping_pair [9, 2] - [9, 55]
              key: flow_node [9, 2] - [9, 6]
                plain_scalar [9, 2] - [9, 6]
                  string_scalar [9, 2] - [9, 6]
              value: flow_node [9, 8] - [9, 55]
                plain_scalar [9, 8] - [9, 55]
                  string_scalar [9, 8] - [9, 55]
--- relevant section
            block_mapping_pair [10, 2] - [11, 66]
              key: flow_node [10, 2] - [10, 22]
                plain_scalar [10, 2] - [10, 22]
                  string_scalar [10, 2] - [10, 22]
              value: block_node [11, 4] - [11, 66]
                block_mapping [11, 4] - [11, 66]
                  block_mapping_pair [11, 4] - [11, 66]
                    key: flow_node [11, 4] - [11, 8]
                      plain_scalar [11, 4] - [11, 8]
                        string_scalar [11, 4] - [11, 8]
                    value: flow_node [11, 10] - [11, 66]
                      double_quote_scalar [11, 10] - [11, 66]
--- end relevant section
            block_mapping_pair [12, 2] - [12, 40]
              key: flow_node [12, 2] - [12, 10]
                plain_scalar [12, 2] - [12, 10]
                  string_scalar [12, 2] - [12, 10]
              value: flow_node [12, 12] - [12, 40]
                plain_scalar [12, 12] - [12, 40]
                  string_scalar [12, 12] - [12, 40]
            block_mapping_pair [13, 2] - [13, 21]
              key: flow_node [13, 2] - [13, 14]
                plain_scalar [13, 2] - [13, 14]
                  string_scalar [13, 2] - [13, 14]
              value: flow_node [13, 16] - [13, 21]
                plain_scalar [13, 16] - [13, 21]
                  boolean_scalar [13, 16] - [13, 21]
jimeh commented 9 months ago

@lcrownover I might be wrong, but I believe Ansible and Helm/Gotemplate style variable interpolation would require more than just syntax highlighting queries.

The syntax tree built by the parser only exposes string nodes, so there's no nodes for interpolated variables available to query against for syntax highlighting purposes I'm afraid.

lcrownover commented 9 months ago

@jimeh Maybe I'm showing my ignorance on the inner workings of treesitter, but why is python able to have this functionality:

print(f"Starting server on port {port}")

(string) ; [29:11 - 43]
 (string_start) ; [29:11 - 12]
 (string_content) ; [29:13 - 36]
 (interpolation) ; [29:37 - 42]
  expression: (identifier) ; [29:38 - 41]
 (string_end) ; [29:43 - 43]

Yet we can't modify the yaml parser to support something similar:

src: "{{ authselect_pam_access_conf_src }}"

value: (flow_node) ; [4:10 - 47]
 (double_quote_scalar) ; [4:10 - 47]

Isn't this the module that governs how the tokens get parsed?

jimeh commented 9 months ago

@lcrownover Apologies, I probably should have elaborated a bit more. The major difference is that in Python string interpolation is part of the language and syntax.

Ansible and Helm templating stuff are not part of YAML's syntax. Hence I can't do anything with the highlight queries in this PR to syntax highlight Ansible and Helm things.

That said, the tree-sitter YAML parser could be modified to support parsing those things, but that's outside of my area expertise. And I would not bet on that happening as they are not part of the YAML specification.

What's more likely is that someone forks the YAML parser here to try and make Ansible and Helm specific tree-sitter parsers.

I think Ansible would be a relatively simple fork, as it is should only be a matter of dealing with string interpolation within quoted strings.

Helm however would be much more complex, because unlike Ansible, Helm templates are not valid YAML syntax. It's more like trying to parse Ansible Jinja2 template files that happen to output YAML. And it would need to fully support Go's text template engine that Helm uses as well.

Hopefully that's helped clarify things a bit :)

lcrownover commented 9 months ago

@jimeh Ahh, I see. It's been so long since I made the original comment that I forgot that this is a PR for the module, not the module itself. My mistake!

I do agree with you that it'd be quite a pain to write a parser to handle all the edge cases that have been laid over the YAML standard. Having an ansible treesitter module would maybe work, though due to the overloading of YAML syntax, it's already hard enough to differentiate between a standard YAML file, an Ansible file (jinja2), or a Helm template as they all share the same extension.

jimeh commented 9 months ago

Yeah, I don't blame you. To make things slightly more convoluted, Ansible technically has two common uses of YAML:

</rant>