With nvim-treesitter delay in opening cmake files

cassava commented 1 year ago

I am using:

neovim 0.8
nvim-treesitter main
tree-sitter-cmake main

When opening any cmake file with this treesitter parse installed and highlight enabled, I notice that there is a large delay of >1.5s before file contents are visible. Normally, a file is displayed within ~150 ms.

If I uninstall cmake treesitter parser problem no longer occurs
If I disable treesitter highlight module the problem no long occurs
If I downgrade treesitter the problem remains
If I downgrade cmake treesitter parser, the delay is less long, but still significant

uyha commented 1 year ago

~~can you share your nvim config and the CMake file you're opening?~~

nevermind, I also notice the delay when opening a CMake file now (not as significant as 1.5s but noticable for sure). I simplied the queries and it seems to help tremendously. Now I am just waiting for upstream to accept the change.

cassava commented 1 year ago

My config is basically LazyNvim with the cmake parser, so I was planning on creating a simplified version. On another machine of mine it's a little less noticeable.

I noticed there are some other issues that may be related:

https://github.com/nvim-treesitter/nvim-treesitter/issues/2913

But I haven't been able to look into it yet.

uyha commented 1 year ago

yeah, it's most likely due to the size of the queries. I am working on a simpler query version.

cassava commented 1 year ago

So how does this work? I somehow thought all the CMake stuff is here but apparently not, if you need to make a PR to nvim-treesitter. Do you have a link to documentation or care to give me the quick rundown – what's in this repo and what's in nvim-treesitter?

cassava commented 1 year ago

Ah, is it that this is just the grammar for the syntax tree, but then for actually doing anything in nvim like highlighting we have everything in nvim-treesitter.

i.e. this is the repo for the parser, and any queries are "module" specific, one of which is the highlighter, and that's specific for each editor?

uyha commented 1 year ago

yes, you can checkout the queries here.

uyha commented 1 year ago

i.e. this is the repo for the parser, and any queries are "module" specific, one of which is the highlighter, and that's specific for each editor?

I'm not sure that applies to other editors, but for nvim, yes, that's the gist of it.

uyha commented 1 year ago

welp, I updated the queries, but it doesn't seem to improve the startup time much. Not sure there's anything I can do more for now.

matze commented 1 year ago

There must be some exponential parse problem in the grammar. I use this grammar for other purposes and have the following expression to match commands with two parameters (e.g. option(NAME ON/OFF)):

(normal_command
  (identifier) @command
  (argument (unquoted_argument) @name)
  (argument (unquoted_argument) @state)
) @whole

without the @state some 1000 options are parsed in ~ 100ms with the @state it takes 6000ms.

uyha commented 1 year ago

thanks for the hint. I will play around to see if i can find the problem.

uyha commented 1 year ago

I'm not sure how tree-sitter works under the hood, but now that I think about it, I am guessing that the grammar only produces the tree, and then the queries will be done on that tree. So if changing a query changes the performance, it may not be the grammar's fault, but I will investiagte some more.

matze commented 1 year ago

Interestingly if I run the query above using tree-sitter query query.scm CMakeLists.txt it's quick. It's just when I use the generated Rust parser API that it slows down to a crawl. I will try to investigate more as well.

matze commented 1 year ago

Not sure if it's really the grammar or the query but given

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("-n", required=True, type=int)
args = parser.parse_args()

print("list(APPEND FOO")

for i in range(args.n):
    print(f"BAR{i}")

print(")")

highlights like this

(normal_command
  (identifier) @command
  (argument (unquoted_argument) @one)
  (argument (unquoted_argument) @two)
  (argument (unquoted_argument) @three)
)

with varying numbers of matched arguments and a command like

python3 gen.py -n N > CMakeLists.txt && time tree-sitter query query.scm CMakeLists.txt

one can quickly see times like

N	one arg	two args	three args
10	0.003	0.015	0.022
40	0.003	0.019	0.431
80	0.003	0.034	11.158
100	0.003	0.027
200	0.003	0.109
400	0.003	0.785
800	0.004	6.153

But the real kicker and reason for that behaviour is that it returns a match for each and every argument, i.e. for the one arg case you see output like

CMakeLists.txt
  pattern: 0
    capture: 0 - command, start: (0, 0), end: (0, 4), text: `list`
    capture: 1 - one, start: (0, 5), end: (0, 11), text: `APPEND`
  pattern: 0
    capture: 0 - command, start: (0, 0), end: (0, 4), text: `list`
    capture: 1 - one, start: (0, 12), end: (0, 15), text: `FOO`
  pattern: 0
    capture: 0 - command, start: (0, 0), end: (0, 4), text: `list`
    capture: 1 - one, start: (1, 0), end: (1, 4), text: `BAR0`
  pattern: 0
    capture: 0 - command, start: (0, 0), end: (0, 4), text: `list`
    capture: 1 - one, start: (2, 0), end: (2, 4), text: `BAR1`
  pattern: 0
    capture: 0 - command, start: (0, 0), end: (0, 4), text: `list`
    capture: 1 - one, start: (3, 0), end: (3, 4), text: `BAR2`
…

instead of just a single list APPEND. It's certainly unexpected from what I can tell.

Edit: so, with that information one can rewrite the queries with the anchor . and constrain the matches to the number of expected arguments and everything behaves as expected. Not sure, anything needs to be done here.

uyha commented 1 year ago

thank you for doing the investigation, it's very useful. For the query with 3 args that you've written, I think it's the expected behaviour for a query written like that. For the spike at the 800 statements with 2 args, I have no idea why that's the case, but 800 statements is also a big number of statements. Could you create an issue to the tree-sitter repo to ask about this problem?

matze commented 1 year ago

For the spike at the 800 statements with 2 args, I have no idea why that's the case, but 800 statements is also a big number of statements.

We do have long lists of sources, maybe not 800 but certainly in the hundreds. I will look a bit deeper into the parser generator but yes, will open something over there if I have more understanding.

uyha / tree-sitter-cmake

With nvim-treesitter delay in opening cmake files #11