Closed gregnis closed 3 weeks ago
Let me update this. I was able to use CLI to parse this simple INI file. Here's what I got:
(document [0, 0] - [8, 0]
(ERROR [0, 0] - [3, 1]
(text [0, 1] - [0, 13])
(ERROR [0, 14] - [0, 15])
(setting_name [1, 0] - [1, 8])
(setting_name [1, 11] - [1, 21])
(ERROR [1, 21] - [1, 22])
(setting_name [2, 0] - [2, 11])
(setting_name [2, 14] - [2, 21])
(setting_name [2, 22] - [2, 27])
(ERROR [2, 27] - [2, 28])
(ERROR [3, 0] - [3, 1]))
(ERROR [4, 0] - [7, 28]
(text [4, 1] - [4, 16])
(ERROR [4, 17] - [4, 18])
(comment [5, 0] - [6, 0]
(text [5, 1] - [5, 12]))
(setting_name [6, 0] - [6, 8])
(setting_name [6, 11] - [6, 21])
(ERROR [6, 21] - [6, 22])
(setting_name [7, 0] - [7, 11])
(setting_name [7, 14] - [7, 21])
(setting_name [7, 22] - [7, 27])
(ERROR [7, 27] - [7, 28])))
As you can see, there are ERRORs instead of sections and keys.
Here's a mystery for me: when I parse the same file in my local playground, I get:
document [0, 0] - [9, 0]
section [0, 0] - [3, 0]
section_name [0, 0] - [1, 0]
text [0, 1] - [0, 13]
setting [1, 0] - [2, 0]
setting_name [1, 0] - [1, 8]
setting_value [1, 10] - [1, 21]
setting [2, 0] - [3, 0]
setting_name [2, 0] - [2, 11]
setting_value [2, 13] - [2, 27]
section [4, 0] - [8, 0]
section_name [4, 0] - [5, 0]
text [4, 1] - [4, 16]
comment [5, 0] - [6, 0]
text [5, 1] - [5, 11]
setting [6, 0] - [7, 0]
setting_name [6, 0] - [6, 8]
setting_value [6, 10] - [6, 21]
setting [7, 0] - [8, 0]
setting_name [7, 0] - [7, 11]
setting_value [7, 13] - [7, 27]
That's perfectly fine. So why can't I get the same result using my
TSParser* parser = ts_parser_new();
ts_parser_set_language(parser, tree_sitter_ini());
TSTree* tree = ts_parser_parse_string_encoding(parser, ...
or using treesitter parse command?
I think I know what's going on. The grammar doesn't handle CR/LF line endings, only LF. That's why the playground works but the parse command does not, for files with Windows line endings.
I changed the grammar to the one below, and it appears to work on both types of files.
module.exports = grammar({
name: 'ini',
extras: $ => [
$.comment,
$._blankLF,
$._blankCRLF,
/[\t ]/
],
rules: {
document: $ => seq(
repeat($._blankLF), // Eat blank lines at top of file.
repeat($._blankCRLF), // Eat blank lines at top of file.
repeat($.section),
),
// Section has:
// - a title
// - zero or more settings (name=value pairs)
section: $ => prec.left(seq(
$.section_name,
repeat(seq(
$.setting,
)),
)),
section_name: $ => seq(
'[',
alias(/[^\[\]\r?\n]+/, $.text),
']',
choice('\n','\r\n'),
),
setting: $ => seq(
alias(/[^;#=\s\[]+/, $.setting_name),
'=',
alias(/.+/, $.setting_value),
choice('\n','\r\n'),
),
// setting_name: () => /[^#=\s\[]+/,
// setting_value: () => /[^#\n]+/,
comment: $ => seq(/[;#]/, alias(/.*/, $.text), optional('\r'), '\n'),
_blankLF: () => field('blank', '\n'),
_blankCRLF: () => field('blank', '\r\n'),
}
});
I'm sure it's naïve (my first attempt at changing a grammar) but it seems to work for me. Please let me know if there is a better way.
The grammar doesn't handle CR/LF line endings, only LF.
Yeah that's probably the case. Can you send a PR (with a test)?
I don't have a setup to do this, perhaps you can use the code I provided to create a PR (assuming it's good).
I'm not sure why but I cannot get a valid tree parsing the sample INI file you provided:
I'm using the latest code for TreeSitter library from https://github.com/tree-sitter/tree-sitter. The code goes like this:
This woks for other languages (cpp, csharp etc.) but while parsing the example above, I get tons of errors that I can see using the ts_tree_print_dot_graph function. It's a long output, here's the top:
digraph tree { edge [arrowhead=none] tree_0175EEC8 [label="document", tooltip="range: 0 - 308 state: 65535 error-cost: 4784 has-changes: 0 depends-on-column: 0 descendant-count: 39 repeat-depth: 0 lookahead-bytes: 1"] tree_0180A748 [label="ERROR", fontcolor=gray, tooltip="range: 0 - 138 state: 0 error-cost: 2428 has-changes: 0 depends-on-column: 0 descendant-count: 17 repeat-depth: 0 lookahead-bytes: 3"] tree_01807308 [label="[", shape=plaintext, tooltip="range: 0 - 2 state: 1 error-cost: 0 has-changes: 0 depends-on-column: 0 descendant-count: 0 repeat-depth: 0 lookahead-bytes: 1"] tree_0180A748 -> tree_01807308 [tooltip=0] tree_01807310 [label="text", shape=plaintext, tooltip="range: 2 - 26 state: 19 error-cost: 0 has-changes: 0 depends-on-column: 0 descendant-count: 0 repeat-depth: 0 lookahead-bytes: 1"]
I attached the whole output. Do you know what's going on and to make it work?
Thanks!
tree_graph.txt