nvim-treesitter / nvim-treesitter-context

Show code context
MIT License
2.48k stars 197 forks source link

Significant performance issues after #198 #220

Closed Video-Nomad closed 1 year ago

Video-Nomad commented 1 year ago

Description

Extreme performance decrease after #198 (6e53eec). Scrolling is much slower on larger python files (3-5k+ lines Alacritty and Windows Terminal). Even scrolling within one screen results in choppy cursor movement.

Reverting back to 373c2e0 solves the lag completely.

Smaller python files work as expected, but going over 500 lines already shows some lag.

Neovim version

NVIM v0.9.0-dev-1184+g0ecb4d725 Build type: RelWithDebInfo LuaJIT 2.1.0-beta3

Expected behavior

Smooth movement within one screen and during fast scroll on larger python files (3-5k lines)

Actual behavior

Choppy cursor movement and lag.

Minimal config

local plugins = {
  ts         = 'https://github.com/nvim-treesitter/nvim-treesitter',
  ts_context = 'https://github.com/nvim-treesitter/nvim-treesitter-context',
  -- ADD ADDITIONAL PLUGINS THAT ARE _NECESSARY_ TO REPRODUCE THE ISSUE
}

for name, url in pairs(plugins) do
  local install_path = '/tmp/nvim/site/'..name
  if vim.fn.isdirectory(install_path) == 0 then
    vim.fn.system { 'git', 'clone', '--depth=1', url, install_path }
  end
  vim.o.runtimepath = install_path..','..vim.o.runtimepath
end

-- ADD INIT.LUA SETTINGS THAT IS _NECESSARY_ FOR REPRODUCING THE ISSUE

Steps to reproduce

  1. Update to 6e53eec
  2. Open large python file 3-5k lines
  3. Observe stuttering during cursor vertical movement.
  4. Revert back to previous commit
  5. Everything is smooth again
Video-Nomad commented 1 year ago

https://user-images.githubusercontent.com/37882690/224424516-236195d9-4c10-4d9c-883a-430ae33bcc7b.mp4

lewis6991 commented 1 year ago

I've been running this with 10k like c files for several months with no perceivable issues.

  1. Open large python file 3-5k lines

This is too vague for a bug report.

Please provide an exact file causing issues.

Video-Nomad commented 1 year ago

Unfortunately, I can't share the original file, but here are combined files from Python standard lib: combined.zip

Starting from 12k lines it perfectly emulates my experience. I guess the complexity of the source file matters as well, not just number of lines, since my 5-6k python file that I can't share really shows the same level of lag as the provided 12k or even 20k.

Here's my neovim setup: nvim.zip Switching from commit = "6e53eec" to commit = "373c2e0" in lua/plugins/context.lua completely solves the lag issue. Completely removing the plug-in solves the lag as well.

mrsflv commented 1 year ago

Hi everyone! I'm experiencing the same issue described by @Video-Nomad, and as he already pointed out, the performance starts degrading when working on files with more than 5k lines. I've created a dummy C file with around 20k lines that produces the unwanted behavior in my machine (the baseline is taken from Neovim, even though I don't think should metter): search.zip For me as well, reverting back to the commits prior to 6e53eec, solve the issue.

Neovim version

NVIM v0.8.0-v0.8.0 Build type: RelWithDebInfo LuaJIT 2.1.0-beta3

Expected behavior

Smooth cursor scrolling/movements even in larger files.

Actual behavior

Laggy cursor scrolling/movements when working on larger files.

Minimal configuration

local plugins = {
  ts         = 'https://github.com/nvim-treesitter/nvim-treesitter',
  ts_context = 'https://github.com/nvim-treesitter/nvim-treesitter-context',
}

for name, url in pairs(plugins) do
  local install_path = '/tmp/nvim/site/'..name
  if vim.fn.isdirectory(install_path) == 0 then
    vim.fn.system { 'git', 'clone', '--depth=1', url, install_path }
  end
  vim.o.runtimepath = install_path..','..vim.o.runtimepath
end

Steps to reproduce

  1. Unzip search.zip
  2. Run nvim --clean -u minimal.lua search.c
  3. Navigating around the file
lewis6991 commented 1 year ago

I'm debugging this right now on Neovim's eval.c (~9k LOC). I'm running an Macbook Air with M1 and I'm not finding it laggy.

The slowest part seems to be the is_valid() function that runs the query, but even on a section of code with ~10 contexts, the delay is only up to 100ms.

mrsflv commented 1 year ago

Interestingly enough, eval.c does not give me any problem! However, I would say that, at least in my case, when I experience the lag it seems not directly related to the number of contexts present but a more general slow down of the whole editor (even in portion of the codes where no context are displayed).

lewis6991 commented 1 year ago

I'm working on a cache for is_valid() will post up soon.

mrsflv commented 1 year ago

No hurry! BTW, thank you all for the great work and fast responses :smiley:

lewis6991 commented 1 year ago

Can you try #224. For me, it reduces the query time significantly.

lewis6991 commented 1 year ago

One moment I broke the contexts.

lewis6991 commented 1 year ago

Ok, now try.

mrsflv commented 1 year ago

Yes, absolutely better! I would still say that it might be a little slower than it was prior to 6e53eec, but it might very well be a personal feeling and moreover I don't have any number at the moment to back it up. If it is something that you and the other contributors are interested in, I can try to get some benchmarks. Regardless, for me is more than good as it is with the fix!

lewis6991 commented 1 year ago

Benchmarks are always good. We weren't running queries before so 6e53eec could very well be faster since it was matching simple lua patterns which are very quick.

Video-Nomad commented 1 year ago

Can confirm, #224 is really much better overall. Thank you for the fast response @lewis6991 !

mrsflv commented 1 year ago

Just to throw out some numbers, I've "profiled" the update() function using the same search.c file, and obtained mean and standard deviation for approx 200 functions call:

I would definitely say that the cache helps :)

lewis6991 commented 1 year ago

Thank you. That's really interesting. There's still probably more room to optimise. For ts-context we actually only want to run a query on a specific node, but the query will also run on all the children nodes (and not match anything). This might need changes upstream.

lewis6991 commented 1 year ago

Also note. You'll find different performance characteristics depending on how you traverse the file. The cache just reuses nodes already visited but if they've not been visited before then we need to run the query again.

mrsflv commented 1 year ago

Hi again! I would preface this saying that I'm no where near an expert in the subject and I only understand at a basic level how treesitter and ts-context work (tbh I'm not even sure that I fully understood the implications of what you wrote in your first reply :smiley:). However, I had some spare time and I tried to do a more accurate profiling of the different plugin functions, and in particular the update() function in addition to all the subsequent function calls (i.e., get_query(), the wrapper memoize(), the function fn(...) used to validate the node, and so on). Using the same search.c I obtained this log file: ts-context.log What I found is this:

[nvim-treesitter-context] - Level 1 - New 'update()' call for line 22355 ... [nvim-treesitter-context] - Level 2 - 'get_query' execution: 369.821 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.270 [nvim-treesitter-context] - Level 4 - 'fn(...)' execution: 6127.797 [micro sec] [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 6185.092 [micro sec] --> 'hash_fn()' call : 1.728 [micro sec] & 'fn(...)' call: 6183.088 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.439 [nvim-treesitter-context] - Level 4 - 'fn(...)' execution: 5824.393 [micro sec] [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 5830.263 [micro sec] --> 'hash_fn()' call : 2.195 [micro sec] & 'fn(...)' call: 5827.919 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.392 [nvim-treesitter-context] - Level 4 - 'fn(...)' execution: 3.347 [micro sec] [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 9.864 [micro sec] --> 'hash_fn()' call : 4.189 [micro sec] & 'fn(...)' call: 5.508 [micro sec] [nvim-treesitter-context] - Level 2 - Parsing loop: 12046.425 [micro sec] --> Loop on parent nodes: 12045.800 [micro sec] --> 'is_valid()' total execution time: 12045.438 [nvim-treesitter-context] - Level 1 - 'update()' execution time: 12425.295 [micro sec]

[nvim-treesitter-context] - Level 1 - New 'update()' call for line 22266 ... [nvim-treesitter-context] - Level 2 - 'get_query' execution: 621.044 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 1.193 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 3.313 [micro sec] --> 'hash_fn()' call : 3.251 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.203 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.666 [micro sec] --> 'hash_fn()' call : 1.634 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 1.014 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 2.851 [micro sec] --> 'hash_fn()' call : 2.816 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.192 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.965 [micro sec] --> 'hash_fn()' call : 1.930 [micro sec] [nvim-treesitter-context] - Level 2 - Parsing loop: 21.391 [micro sec] --> Loop on parent nodes: 18.719 [micro sec] --> 'is_valid()' total execution time: 18.439 [nvim-treesitter-context] - Level 1 - 'update()' execution time: 666.105 [micro sec]

However, what I found most interesting is that in some cases (it seems to me when multiple context are present) the performance of hash_fn() call starts degrading:

[nvim-treesitter-context] - Level 1 - New 'update()' call for line 22126 ... [nvim-treesitter-context] - Level 2 - 'get_query' execution: 213.628 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 1.514 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 13.593 [micro sec] --> 'hash_fn()' call : 13.297 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 1.079 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 11.230 [micro sec] --> 'hash_fn()' call : 11.024 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.975 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 11.283 [micro sec] --> 'hash_fn()' call : 11.083 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 1.028 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 11.545 [micro sec] --> 'hash_fn()' call : 11.358 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 1.097 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 10.603 [micro sec] --> 'hash_fn()' call : 10.372 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.198 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.783 [micro sec] --> 'hash_fn()' call : 1.751 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.247 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.889 [micro sec] --> 'hash_fn()' call : 1.849 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.174 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.766 [micro sec] --> 'hash_fn()' call : 1.734 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.180 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.891 [micro sec] --> 'hash_fn()' call : 1.858 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.176 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.991 [micro sec] --> 'hash_fn()' call : 1.949 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.179 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.983 [micro sec] --> 'hash_fn()' call : 1.949 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.184 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.999 [micro sec] --> 'hash_fn()' call : 1.958 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.166 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.752 [micro sec] --> 'hash_fn()' call : 1.712 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.171 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 2.917 [micro sec] --> 'hash_fn()' call : 2.878 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.201 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.926 [micro sec] --> 'hash_fn()' call : 1.891 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.181 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.771 [micro sec] --> 'hash_fn()' call : 1.736 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.166 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 5.717 [micro sec] --> 'hash_fn()' call : 5.675 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.197 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.964 [micro sec] --> 'hash_fn()' call : 1.929 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.220 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.772 [micro sec] --> 'hash_fn()' call : 1.727 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.179 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.657 [micro sec] --> 'hash_fn()' call : 1.625 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.190 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.768 [micro sec] --> 'hash_fn()' call : 1.735 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.169 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.692 [micro sec] --> 'hash_fn()' call : 1.659 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.185 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.701 [micro sec] --> 'hash_fn()' call : 1.669 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.176 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.599 [micro sec] --> 'hash_fn()' call : 1.566 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.185 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.660 [micro sec] --> 'hash_fn()' call : 1.623 [micro sec] [nvim-treesitter-context] - Level 2 - Parsing loop: 266.902 [micro sec] --> Loop on parent nodes: 228.955 [micro sec] --> 'is_valid()' total execution time: 225.767 [nvim-treesitter-context] - Level 1 - 'update()' execution time: 614.650 [micro sec]

Now, I'm not sure what to make out of it or even it is something that it is truly worth investing some time in to improve it (after all we are talking about micro seconds), but I figured that, regardless of the future developments, it might be of interest for you all :smile:

lewis6991 commented 1 year ago

Thank you. That's really interesting. There's still probably more room to optimise. For ts-context we actually only want to run a query on a specific node, but the query will also run on all the children nodes (and not match anything). This might need changes upstream.

I've raised https://github.com/neovim/neovim/pull/22710 and https://github.com/tree-sitter/tree-sitter/pull/2140 to address this upstream.

yioneko commented 1 year ago

The difference of search space in syntax tree between two implementations illustrated:

image

Even with depth limit, the siblings of parent nodes will be unnecessarily included. I agree that query based approach helps with unite of tree-sitter utility between different editors, but using a top-bottom analysis tool for a bottom-up situation is inappropriate and will cause perf issue like this.

This is just a friendly reminding, I'm not meant to against the works you have done. I thought tree-sitter query is fast enough so I didn't raise my points before the rewrite pr merged. Just for anyone encountering this problem to learn about the core reasons.

mrsflv commented 1 year ago

Thank you for taking the time to put together a brief explanation of the logic behind the two different approaches. It was really helpful!

lewis6991 commented 1 year ago

Even with depth limit, the siblings of parent nodes will be unnecessarily included. I agree that query based approach helps with unite of tree-sitter utility between different editors, but using a top-bottom analysis tool for a bottom-up situation is inappropriate and will cause perf issue like this.

The idea is that with a depth of zero, only a single node (the one passed to the query) will be checked against the patterns. However this currently appears not to work with my treesitter PR (need to look into that). Though even with a depth of 1, I observe a perf increase of 10x, so once you add node caching then perf should be fine.

Another idea, would be to do some custom query parsing of simple patterns (e.g. (node) @capture) and do simple name matching like before and then disable these patterns in the query.

yioneko commented 1 year ago

The idea is that with a depth of zero, only a single node (the one passed to the query) will be checked against the patterns.

If depth = 0, what's the difference compared to the previous pattern approach? The query could be only used to match on one node and lose most of its power.

Another idea, would be to do some custom query parsing of simple patterns (e.g. (node) @capture) and do simple name matching like before and then disable these patterns in the query.

Well, when you can just use lua tables for simple form of patterns... This is workable, but difficult and comes with slightly high cost of maintenance.

All of these are brought by the enforced usage of query, but if you think the benefits to community outweighs, then let time prove it. Tree-sitter is still in early stage so at least any approach seems valuable to me.

lewis6991 commented 1 year ago

The old way of using patterns was horrible for maintenance, extremely restricted and led to random PRs trying to make them more flexible which in the end made everything much more complicated. I didn't even understand half the code.

Queries on the other hand are a standard format that can match any arbitrary pattern and support directives and predicates, and are editor agnostic. The performance issues are simply due to the lack of features upstream which we are trying to develop and address.

yioneko commented 1 year ago

The strength of query is based on performance cost. Personally, I didn't see the limitation of the previous version so maybe I'm in lack of some context. As far as I understand, we are trading performance for flexibility, am I wrong?

The performance issues are simply due to the lack of features upstream which we are trying to develop and address.

But it can never be as performant as the previous approach by nature, as illustrated above, unless query depth set to 0 to be degraded. Also it's hard to say that some of the optimizations could be eventually applied.

Queries on the other hand are a standard format that can match any arbitrary pattern and support directives and predicates, and are editor agnostic.

I know that, but as I've said before, using a top-bottom analysis tool for a bottom-up situation is inappropriate. This is the core problem and can only be worked around but impossible to resolve, unless the algorithm used in tree-sitter query is changed thoroughly.

lewis6991 commented 1 year ago

As far as I understand, we are trading performance for flexibility, am I wrong?

And maintainability. Again, the old version was very messy.

And again, the perf issue isn't fundamental but a result of the current upstream implementation.

But it can never be as performant as the previous approach by nature, as illustrated above, unless query depth set to 0 to be degraded. Also it's hard to say that some of the optimizations could be eventually applied.

That's just not true, if we run a query on a single node then perf should be similar. We're just trading lua patterns for TS query patterns.

Degraded? What degraded?

Why is it hard to say if they can be applied? I know they can be applied. We already have node caching, parsing of basic queries won't be that hard, and the TS PR already works.

I know that, but as I've said before, using a top-bottom analysis tool for a bottom-up situation is inappropriate. This is the core problem and can only be worked around but impossible to resolve, unless the algorithm used in tree-sitter query is changed thoroughly.

The problem is solved by allowing queries to only match a single node. It is as simple as that.

yioneko commented 1 year ago

Why is it hard to say if they can be applied? I know they can be applied. We already have node caching, parsing of basic queries won't be that hard, and the TS PR already works.

My subjective opinions. No supports. So you are right.

Degraded? What degraded?

Then the query cannot test child, sibilings, etc. Only the node itself. I see this as no difference between previous version.

The problem is solved by allowing queries to only match a single node. It is as simple as that.

Resolved only when queries only match a single node. If we want flexibility, we cannot blindlessly set depth to 0. Then the problem still exists.

lewis6991 commented 1 year ago

Then the query cannot test child, sibilings, etc. Only the node itself. I see this as no difference between previous version.

Yes they can. Take a close look at the PR. The depth is only checked when descending into the tree which is done after pattern matching. The patterns will still match on children and siblings. The difference is that the whole pattern will not be matched in children.

yioneko commented 1 year ago

The patterns will still match on children and siblings. The difference is that the whole pattern will not be matched in children.

So the query actually tests on children even with depth = 0 (only leaf pattern)? Then I'd say this is even worse because zero depth still performs worse than previous version. https://github.com/nvim-treesitter/nvim-treesitter-context/issues/220#issuecomment-1474732365 still holds true.

lewis6991 commented 1 year ago

Sorry, I really don't think you understand how queries work, and I don't think I'll be able to convince you otherwise.

yioneko commented 1 year ago

Yes, you are right, I haven't looked into the internals of query matching.

Ok, https://github.com/nvim-treesitter/nvim-treesitter-context/issues/220#issuecomment-1474827938 is wrong.

The patterns will still match on children and siblings. The difference is that the whole pattern will not be matched in children.

Only some special patterns could match on limited children without descending. So the correct summary is: when depth = 0, we gain slightly better flexibility than previous approach and maintainbility, otherwise when depth >= 1, https://github.com/nvim-treesitter/nvim-treesitter-context/issues/220#issuecomment-1474732365 mostly tells the cause of performance issue.

Honestly, I don't know what's the meaning to correct on such a small facet...

lewis6991 commented 1 year ago

Updated https://github.com/tree-sitter/tree-sitter/pull/2140 to properly work with depth=0.

Just to confirm:

With the following query and depth=0:

(if_statement) @capture1

(if_statement 
  condition: (parenthesized_expression (identifier))
) @capture2

So basically perf scales with the complexity of the pattern.

I've tested the above PR with this plugin and perf increase is between 50-100x (or roughly 50-70ms -> 0.5-2ms).

lucario387 commented 1 year ago

Updated tree-sitter/tree-sitter#2140 to properly work with depth=0.

Just to confirm:

With the following query and depth=0:

(if_statement) @capture1

(if_statement 
  condition: (parenthesized_expression (identifier))
) @capture2
  • The match for @capture1 will never descend because the match is complete at 0 depth.
  • The match for @capture2 will only descend at most to a depth of 3 since there are 2 subchildren in the pattern.

So basically perf scales with the complexity of the pattern.

I've tested the above PR with this plugin and perf increase is between 50-100x (or roughly 50-70ms -> 0.5-2ms).

Sorry for asking but I just want to be clear, with the same queries as above, what would happen for the case below?, Will both queries be applied or only one of them be applied

if {
  if {
    if {
      ...
lewis6991 commented 1 year ago

For this plugin? I believe we return on the first match for each parent.

lucario387 commented 1 year ago

For this plugin? I believe we return on the first match for each parent.

For general actually, sorry for not clarifying, as I want to figure out how to do C indents in nvim-treesitter in case the behavior changed I have a nested if query for indent/dedent so I was wondering whether tree-sitter/tree-sitter#2140 would change the behavior

lewis6991 commented 1 year ago

It depends on how you iterate the query, but generally it will match all of them and it's up to you which match to go with.

lewis6991 commented 1 year ago

For anyone watching https://github.com/nvim-treesitter/nvim-treesitter-context/pull/271 is now merged which limits the query depth. For me I'm seeing a roughly 10x increase in performance. Mileage will vary.