Closed Video-Nomad closed 1 year ago
I've been running this with 10k like c files for several months with no perceivable issues.
- Open large python file 3-5k lines
This is too vague for a bug report.
Please provide an exact file causing issues.
Unfortunately, I can't share the original file, but here are combined files from Python standard lib: combined.zip
Starting from 12k lines it perfectly emulates my experience. I guess the complexity of the source file matters as well, not just number of lines, since my 5-6k python file that I can't share really shows the same level of lag as the provided 12k or even 20k.
Here's my neovim setup: nvim.zip Switching from commit = "6e53eec" to commit = "373c2e0" in lua/plugins/context.lua completely solves the lag issue. Completely removing the plug-in solves the lag as well.
Hi everyone! I'm experiencing the same issue described by @Video-Nomad, and as he already pointed out, the performance starts degrading when working on files with more than 5k lines. I've created a dummy C file with around 20k lines that produces the unwanted behavior in my machine (the baseline is taken from Neovim, even though I don't think should metter): search.zip For me as well, reverting back to the commits prior to 6e53eec, solve the issue.
NVIM v0.8.0-v0.8.0 Build type: RelWithDebInfo LuaJIT 2.1.0-beta3
Smooth cursor scrolling/movements even in larger files.
Laggy cursor scrolling/movements when working on larger files.
local plugins = {
ts = 'https://github.com/nvim-treesitter/nvim-treesitter',
ts_context = 'https://github.com/nvim-treesitter/nvim-treesitter-context',
}
for name, url in pairs(plugins) do
local install_path = '/tmp/nvim/site/'..name
if vim.fn.isdirectory(install_path) == 0 then
vim.fn.system { 'git', 'clone', '--depth=1', url, install_path }
end
vim.o.runtimepath = install_path..','..vim.o.runtimepath
end
nvim --clean -u minimal.lua search.c
I'm debugging this right now on Neovim's eval.c
(~9k LOC). I'm running an Macbook Air with M1 and I'm not finding it laggy.
The slowest part seems to be the is_valid()
function that runs the query, but even on a section of code with ~10 contexts, the delay is only up to 100ms.
Interestingly enough, eval.c
does not give me any problem!
However, I would say that, at least in my case, when I experience the lag it seems not directly related to the number of contexts present but a more general slow down of the whole editor (even in portion of the codes where no context are displayed).
I'm working on a cache for is_valid()
will post up soon.
No hurry! BTW, thank you all for the great work and fast responses :smiley:
Can you try #224. For me, it reduces the query time significantly.
One moment I broke the contexts.
Ok, now try.
eval.c
it takes the average from 40ms -> 2ms.search.c
it takes the average from 66ms -> 6ms.combined-20k.py
it takes the average from 50ms -> 3ms.Yes, absolutely better! I would still say that it might be a little slower than it was prior to 6e53eec, but it might very well be a personal feeling and moreover I don't have any number at the moment to back it up. If it is something that you and the other contributors are interested in, I can try to get some benchmarks. Regardless, for me is more than good as it is with the fix!
Benchmarks are always good. We weren't running queries before so 6e53eec could very well be faster since it was matching simple lua patterns which are very quick.
Can confirm, #224 is really much better overall. Thank you for the fast response @lewis6991 !
Just to throw out some numbers, I've "profiled" the update()
function using the same search.c
file, and obtained mean and standard deviation for approx 200 functions call:
I would definitely say that the cache helps :)
Thank you. That's really interesting. There's still probably more room to optimise. For ts-context we actually only want to run a query on a specific node, but the query will also run on all the children nodes (and not match anything). This might need changes upstream.
Also note. You'll find different performance characteristics depending on how you traverse the file. The cache just reuses nodes already visited but if they've not been visited before then we need to run the query again.
Hi again!
I would preface this saying that I'm no where near an expert in the subject and I only understand at a basic level how treesitter and ts-context work (tbh I'm not even sure that I fully understood the implications of what you wrote in your first reply :smiley:).
However, I had some spare time and I tried to do a more accurate profiling of the different plugin functions, and in particular the update()
function in addition to all the subsequent function calls (i.e., get_query()
, the wrapper memoize()
, the function fn(...)
used to validate the node, and so on). Using the same search.c
I obtained this log file: ts-context.log
What I found is this:
[nvim-treesitter-context] - Level 1 - New 'update()' call for line 22355 ... [nvim-treesitter-context] - Level 2 - 'get_query' execution: 369.821 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.270 [nvim-treesitter-context] - Level 4 - 'fn(...)' execution: 6127.797 [micro sec] [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 6185.092 [micro sec] --> 'hash_fn()' call : 1.728 [micro sec] & 'fn(...)' call: 6183.088 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.439 [nvim-treesitter-context] - Level 4 - 'fn(...)' execution: 5824.393 [micro sec] [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 5830.263 [micro sec] --> 'hash_fn()' call : 2.195 [micro sec] & 'fn(...)' call: 5827.919 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.392 [nvim-treesitter-context] - Level 4 - 'fn(...)' execution: 3.347 [micro sec] [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 9.864 [micro sec] --> 'hash_fn()' call : 4.189 [micro sec] & 'fn(...)' call: 5.508 [micro sec] [nvim-treesitter-context] - Level 2 - Parsing loop: 12046.425 [micro sec] --> Loop on parent nodes: 12045.800 [micro sec] --> 'is_valid()' total execution time: 12045.438 [nvim-treesitter-context] - Level 1 - 'update()' execution time: 12425.295 [micro sec]
[nvim-treesitter-context] - Level 1 - New 'update()' call for line 22266 ... [nvim-treesitter-context] - Level 2 - 'get_query' execution: 621.044 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 1.193 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 3.313 [micro sec] --> 'hash_fn()' call : 3.251 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.203 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.666 [micro sec] --> 'hash_fn()' call : 1.634 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 1.014 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 2.851 [micro sec] --> 'hash_fn()' call : 2.816 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.192 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.965 [micro sec] --> 'hash_fn()' call : 1.930 [micro sec] [nvim-treesitter-context] - Level 2 - Parsing loop: 21.391 [micro sec] --> Loop on parent nodes: 18.719 [micro sec] --> 'is_valid()' total execution time: 18.439 [nvim-treesitter-context] - Level 1 - 'update()' execution time: 666.105 [micro sec]
However, what I found most interesting is that in some cases (it seems to me when multiple context are present) the performance of hash_fn()
call starts degrading:
[nvim-treesitter-context] - Level 1 - New 'update()' call for line 22126 ... [nvim-treesitter-context] - Level 2 - 'get_query' execution: 213.628 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 1.514 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 13.593 [micro sec] --> 'hash_fn()' call : 13.297 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 1.079 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 11.230 [micro sec] --> 'hash_fn()' call : 11.024 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.975 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 11.283 [micro sec] --> 'hash_fn()' call : 11.083 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 1.028 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 11.545 [micro sec] --> 'hash_fn()' call : 11.358 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 1.097 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 10.603 [micro sec] --> 'hash_fn()' call : 10.372 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.198 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.783 [micro sec] --> 'hash_fn()' call : 1.751 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.247 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.889 [micro sec] --> 'hash_fn()' call : 1.849 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.174 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.766 [micro sec] --> 'hash_fn()' call : 1.734 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.180 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.891 [micro sec] --> 'hash_fn()' call : 1.858 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.176 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.991 [micro sec] --> 'hash_fn()' call : 1.949 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.179 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.983 [micro sec] --> 'hash_fn()' call : 1.949 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.184 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.999 [micro sec] --> 'hash_fn()' call : 1.958 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.166 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.752 [micro sec] --> 'hash_fn()' call : 1.712 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.171 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 2.917 [micro sec] --> 'hash_fn()' call : 2.878 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.201 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.926 [micro sec] --> 'hash_fn()' call : 1.891 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.181 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.771 [micro sec] --> 'hash_fn()' call : 1.736 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.166 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 5.717 [micro sec] --> 'hash_fn()' call : 5.675 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.197 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.964 [micro sec] --> 'hash_fn()' call : 1.929 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.220 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.772 [micro sec] --> 'hash_fn()' call : 1.727 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.179 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.657 [micro sec] --> 'hash_fn()' call : 1.625 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.190 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.768 [micro sec] --> 'hash_fn()' call : 1.735 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.169 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.692 [micro sec] --> 'hash_fn()' call : 1.659 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.185 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.701 [micro sec] --> 'hash_fn()' call : 1.669 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.176 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.599 [micro sec] --> 'hash_fn()' call : 1.566 [micro sec] [nvim-treesitter-context] - Level 4 - 'hash_node()' execution: 0.185 [nvim-treesitter-context] - Level 3 - 'memoize()' execution: 1.660 [micro sec] --> 'hash_fn()' call : 1.623 [micro sec] [nvim-treesitter-context] - Level 2 - Parsing loop: 266.902 [micro sec] --> Loop on parent nodes: 228.955 [micro sec] --> 'is_valid()' total execution time: 225.767 [nvim-treesitter-context] - Level 1 - 'update()' execution time: 614.650 [micro sec]
Now, I'm not sure what to make out of it or even it is something that it is truly worth investing some time in to improve it (after all we are talking about micro seconds), but I figured that, regardless of the future developments, it might be of interest for you all :smile:
Thank you. That's really interesting. There's still probably more room to optimise. For ts-context we actually only want to run a query on a specific node, but the query will also run on all the children nodes (and not match anything). This might need changes upstream.
I've raised https://github.com/neovim/neovim/pull/22710 and https://github.com/tree-sitter/tree-sitter/pull/2140 to address this upstream.
The difference of search space in syntax tree between two implementations illustrated:
Even with depth limit, the siblings of parent nodes will be unnecessarily included. I agree that query based approach helps with unite of tree-sitter utility between different editors, but using a top-bottom analysis tool for a bottom-up situation is inappropriate and will cause perf issue like this.
This is just a friendly reminding, I'm not meant to against the works you have done. I thought tree-sitter query is fast enough so I didn't raise my points before the rewrite pr merged. Just for anyone encountering this problem to learn about the core reasons.
Thank you for taking the time to put together a brief explanation of the logic behind the two different approaches. It was really helpful!
Even with depth limit, the siblings of parent nodes will be unnecessarily included. I agree that query based approach helps with unite of tree-sitter utility between different editors, but using a top-bottom analysis tool for a bottom-up situation is inappropriate and will cause perf issue like this.
The idea is that with a depth of zero, only a single node (the one passed to the query) will be checked against the patterns. However this currently appears not to work with my treesitter PR (need to look into that). Though even with a depth of 1, I observe a perf increase of 10x, so once you add node caching then perf should be fine.
Another idea, would be to do some custom query parsing of simple patterns (e.g. (node) @capture
) and do simple name matching like before and then disable these patterns in the query.
The idea is that with a depth of zero, only a single node (the one passed to the query) will be checked against the patterns.
If depth = 0, what's the difference compared to the previous pattern approach? The query could be only used to match on one node and lose most of its power.
Another idea, would be to do some custom query parsing of simple patterns (e.g. (node) @capture) and do simple name matching like before and then disable these patterns in the query.
Well, when you can just use lua tables for simple form of patterns... This is workable, but difficult and comes with slightly high cost of maintenance.
All of these are brought by the enforced usage of query, but if you think the benefits to community outweighs, then let time prove it. Tree-sitter is still in early stage so at least any approach seems valuable to me.
The old way of using patterns was horrible for maintenance, extremely restricted and led to random PRs trying to make them more flexible which in the end made everything much more complicated. I didn't even understand half the code.
Queries on the other hand are a standard format that can match any arbitrary pattern and support directives and predicates, and are editor agnostic. The performance issues are simply due to the lack of features upstream which we are trying to develop and address.
The strength of query is based on performance cost. Personally, I didn't see the limitation of the previous version so maybe I'm in lack of some context. As far as I understand, we are trading performance for flexibility, am I wrong?
The performance issues are simply due to the lack of features upstream which we are trying to develop and address.
But it can never be as performant as the previous approach by nature, as illustrated above, unless query depth set to 0 to be degraded. Also it's hard to say that some of the optimizations could be eventually applied.
Queries on the other hand are a standard format that can match any arbitrary pattern and support directives and predicates, and are editor agnostic.
I know that, but as I've said before, using a top-bottom analysis tool for a bottom-up situation is inappropriate. This is the core problem and can only be worked around but impossible to resolve, unless the algorithm used in tree-sitter query is changed thoroughly.
As far as I understand, we are trading performance for flexibility, am I wrong?
And maintainability. Again, the old version was very messy.
And again, the perf issue isn't fundamental but a result of the current upstream implementation.
But it can never be as performant as the previous approach by nature, as illustrated above, unless query depth set to 0 to be degraded. Also it's hard to say that some of the optimizations could be eventually applied.
That's just not true, if we run a query on a single node then perf should be similar. We're just trading lua patterns for TS query patterns.
Degraded? What degraded?
Why is it hard to say if they can be applied? I know they can be applied. We already have node caching, parsing of basic queries won't be that hard, and the TS PR already works.
I know that, but as I've said before, using a top-bottom analysis tool for a bottom-up situation is inappropriate. This is the core problem and can only be worked around but impossible to resolve, unless the algorithm used in tree-sitter query is changed thoroughly.
The problem is solved by allowing queries to only match a single node. It is as simple as that.
Why is it hard to say if they can be applied? I know they can be applied. We already have node caching, parsing of basic queries won't be that hard, and the TS PR already works.
My subjective opinions. No supports. So you are right.
Degraded? What degraded?
Then the query cannot test child, sibilings, etc. Only the node itself. I see this as no difference between previous version.
The problem is solved by allowing queries to only match a single node. It is as simple as that.
Resolved only when queries only match a single node. If we want flexibility, we cannot blindlessly set depth to 0. Then the problem still exists.
Then the query cannot test child, sibilings, etc. Only the node itself. I see this as no difference between previous version.
Yes they can. Take a close look at the PR. The depth is only checked when descending into the tree which is done after pattern matching. The patterns will still match on children and siblings. The difference is that the whole pattern will not be matched in children.
The patterns will still match on children and siblings. The difference is that the whole pattern will not be matched in children.
So the query actually tests on children even with depth = 0 (only leaf pattern)? Then I'd say this is even worse because zero depth still performs worse than previous version. https://github.com/nvim-treesitter/nvim-treesitter-context/issues/220#issuecomment-1474732365 still holds true.
Sorry, I really don't think you understand how queries work, and I don't think I'll be able to convince you otherwise.
Yes, you are right, I haven't looked into the internals of query matching.
Ok, https://github.com/nvim-treesitter/nvim-treesitter-context/issues/220#issuecomment-1474827938 is wrong.
The patterns will still match on children and siblings. The difference is that the whole pattern will not be matched in children.
Only some special patterns could match on limited children without descending. So the correct summary is: when depth = 0, we gain slightly better flexibility than previous approach and maintainbility, otherwise when depth >= 1, https://github.com/nvim-treesitter/nvim-treesitter-context/issues/220#issuecomment-1474732365 mostly tells the cause of performance issue.
Honestly, I don't know what's the meaning to correct on such a small facet...
Updated https://github.com/tree-sitter/tree-sitter/pull/2140 to properly work with depth=0.
Just to confirm:
With the following query and depth=0:
(if_statement) @capture1
(if_statement
condition: (parenthesized_expression (identifier))
) @capture2
@capture1
will never descend because the match is complete at 0 depth.@capture2
will only descend at most to a depth of 3 since there are 2 subchildren in the pattern.So basically perf scales with the complexity of the pattern.
I've tested the above PR with this plugin and perf increase is between 50-100x (or roughly 50-70ms -> 0.5-2ms).
Updated tree-sitter/tree-sitter#2140 to properly work with depth=0.
Just to confirm:
With the following query and depth=0:
(if_statement) @capture1 (if_statement condition: (parenthesized_expression (identifier)) ) @capture2
- The match for
@capture1
will never descend because the match is complete at 0 depth.- The match for
@capture2
will only descend at most to a depth of 3 since there are 2 subchildren in the pattern.So basically perf scales with the complexity of the pattern.
I've tested the above PR with this plugin and perf increase is between 50-100x (or roughly 50-70ms -> 0.5-2ms).
Sorry for asking but I just want to be clear, with the same queries as above, what would happen for the case below?, Will both queries be applied or only one of them be applied
if {
if {
if {
...
For this plugin? I believe we return on the first match for each parent.
For this plugin? I believe we return on the first match for each parent.
For general actually, sorry for not clarifying, as I want to figure out how to do C indents in nvim-treesitter in case the behavior changed I have a nested if query for indent/dedent so I was wondering whether tree-sitter/tree-sitter#2140 would change the behavior
It depends on how you iterate the query, but generally it will match all of them and it's up to you which match to go with.
For anyone watching https://github.com/nvim-treesitter/nvim-treesitter-context/pull/271 is now merged which limits the query depth. For me I'm seeing a roughly 10x increase in performance. Mileage will vary.
Description
Extreme performance decrease after #198 (6e53eec). Scrolling is much slower on larger python files (3-5k+ lines Alacritty and Windows Terminal). Even scrolling within one screen results in choppy cursor movement.
Reverting back to 373c2e0 solves the lag completely.
Smaller python files work as expected, but going over 500 lines already shows some lag.
Neovim version
NVIM v0.9.0-dev-1184+g0ecb4d725 Build type: RelWithDebInfo LuaJIT 2.1.0-beta3
Expected behavior
Smooth movement within one screen and during fast scroll on larger python files (3-5k lines)
Actual behavior
Choppy cursor movement and lag.
Minimal config
Steps to reproduce