tree-sitter / py-tree-sitter

Python bindings to the Tree-sitter parsing library
https://tree-sitter.github.io/py-tree-sitter/
MIT License
817 stars 96 forks source link

Results returned by `Query.matches()` and `Query.captures()` are different #221

Closed YikeZhou closed 4 months ago

YikeZhou commented 5 months ago

Hello, recently I noticed an inconsistency in the results I got from Query.captures() and tree-sitter-cli (version: 0.22.2). Then I tried Query.matches() with the same source code and query, it seemed that the result given by Query.captures() is incorrect.

While opening this issue, I noticed #208 has changed lots of APIs, including Query.captures(). I hope this can help detect potential defects in this API.

Reproduction Steps

Here is a Python program to demonstrate the problem I observed.

from pprint import pprint

from tree_sitter import Language, Parser

Language.build_library(
  'build/verilog.so',
  ['tree-sitter-verilog']
)

VL_LANGUAGE = Language('build/verilog.so', 'verilog')

parser = Parser()
parser.set_language(VL_LANGUAGE)

tree = parser.parse(b'''
module top (input clk, input cond);
  reg r1, r2;
  always @(posedge clk) begin
    if (cond)
      r1 <= 1;
    else
      r1 <= 0;

    if (cond)
      r2 <= 1;
  end
endmodule
''')

query = VL_LANGUAGE.query('''
(conditional_statement
    (cond_predicate) @cond .
    (statement_or_null) @true_branch .) @stmt
''')

print('Captures:\n')
pprint(query.captures(tree.root_node))

print('\nMatches:\n')
pprint(query.matches(tree.root_node))

And it outputs:

# FutureWarnings are omitted

Captures:

[(<Node type=conditional_statement, start_point=(4, 4), end_point=(7, 14)>,
  'stmt'),
 (<Node type=cond_predicate, start_point=(4, 8), end_point=(4, 12)>, 'cond'),
 (<Node type=conditional_statement, start_point=(9, 4), end_point=(10, 14)>,
  'stmt'),
 (<Node type=cond_predicate, start_point=(9, 8), end_point=(9, 12)>, 'cond'),
 (<Node type=statement_or_null, start_point=(10, 6), end_point=(10, 14)>,
  'true_branch')]

Matches:

[(0,
  {'cond': <Node type=cond_predicate, start_point=(9, 8), end_point=(9, 12)>,
   'stmt': <Node type=conditional_statement, start_point=(9, 4), end_point=(10, 14)>,
   'true_branch': <Node type=statement_or_null, start_point=(10, 6), end_point=(10, 14)>})]

Run tree-sitter-cli on the same source code and query, it gives:

  pattern: 0
    capture: stmt, start: (9, 4), end: (10, 14)
    capture: 0 - cond, start: (9, 8), end: (9, 12), text: `cond`
    capture: 1 - true_branch, start: (10, 6), end: (10, 14), text: `r2 <= 1;`

Version

I installed py-tree-sitter with pip on an x86-64 Linux PC.

$ pip freeze
tree-sitter==0.21.3
ObserverOfTime commented 5 months ago

I think that's intended, just not properly documented. captures returns all the captured nodes while matches returns the last one.

YikeZhou commented 5 months ago

Thanks for the quick reply! I just ran tree-sitter query with the --captures option (which I failed to notice before). Its result was the same with Query.captures().

jmehnle commented 2 weeks ago

What does "the last [node]" mean?