tree-sitter / tree-sitter-go

Go grammar for tree-sitter
MIT License
317 stars 63 forks source link

String literals are missing their content #150

Closed wetneb closed 1 week ago

wetneb commented 1 month ago

The grammar defines string literals like this:

    interpreted_string_literal: $ => seq(
      '"',
      repeat(choice(
        $._interpreted_string_literal_basic_content,
        $.escape_sequence,
      )),
      token.immediate('"'),
    ),
    _interpreted_string_literal_basic_content: _ => token.immediate(prec(1, /[^"\n\\]+/)),

Note that _interpreted_string_literal_basic_content is marked as private. This means that as a user, when parsing a string literal that only contains basic content (no escape sequence), I will see the interpreted_string_literal as having only two children: the two double quotes delimiting it. There is no child for the actual content.

This means that "foo" and "bar" have the exact same representation (two double quotes, without content between them).

I have the impression that other grammars tend to emit a public node for the string content. This seems more natural to me as a user. See for instance: https://github.com/tree-sitter/tree-sitter-javascript/blob/b6f0624c1447bc209830b195999b78a56b10a579/grammar.js#L951-L977

https://github.com/tree-sitter/tree-sitter-python/blob/8c65e256f971812276ff2a69a2f515c218ed7f82/grammar.js#L1076-L1080

amaanq commented 1 week ago

Forgot to mention they've been added in https://github.com/tree-sitter/tree-sitter-go/commit/47e8b1fae7541f6e01cead97201be19321ec362a. Thanks for filing the issue :slightly_smiling_face: