tree-sitter / tree-sitter

An incremental parsing system for programming tools
https://tree-sitter.github.io
MIT License
18.08k stars 1.37k forks source link

Query quantifier - matching one or more nodes followed by one or more wildcards #2822

Open gushogg-blake opened 9 months ago

gushogg-blake commented 9 months ago

I have the following query to capture any lets at the beginning of a JS file, as well as the rest of the file:

(program
    (lexical_declaration)+ @lets
    (_)+ @rest
)

The problem is that the query matcher prefers to fill up the wildcard, so with the following code only the first let is captured in @lets:

let a = 1;
let b = 2;
let c = 3;
let d = 4;

function f() {
}

function g() {
}

Here lets contains let a = 1; and rest contains the rest of the file.

Adding a . after (_)+ @rest allows lets to capture all the lets, but then only the last wildcard node (function g() {}) is captured as rest. function f() {} is not captured in that case.

Expected behaviour: (lexical_declaration)+ captures as many nodes as it can and subsequent (_)+ captures whatever is left.

Actual behaviour: (lexical_declaration)+ captures as little as possible and subsequent (_)+ captures whatever is left.

Is this the intended behaviour of mixing named and wildcard quantifiers? Intuitively it seems like the first quantifier should capture as many as it can and only then should the wildcard take over.

ObserverOfTime commented 5 months ago

Regex quantifiers are greedy by default. This can be solved by implementing lazy +? / *? quantifiers ala JS.