Open medwatt opened 2 years ago
You're editing source code in the markdown. We know that language injections are not implemented in the most efficient way (no incremental parsing). Can you identify whether the markdown parser or the language injection is the problem?
@theHamsta, I created a new file with just 500 lines or so, and without any code blocks. Here's the result. So, I believe the markdown parser is likely the one causing the slowdown.
@medwatt the markdown parser can be used also from other editors do you have the time to check whether helix
or the tree-sitter web playground have the same problem. I will try to get the timing for parsing and querying from Neovim to see where we have the problem.
@theHamsta, this is the first time I am hearing of these. However, I installed the helix editor to test. I am having problems installing the markdown parser. According to the documentation, all I need to do is put the following in the languages.toml file in the config folder:
[[grammar]]
name = "markdown"
source = { git = "https://github.com/ikatyang/tree-sitter-markdown" }
Launching helix gives the following:
Bad language config: unknown field `grammar`, expected `language`
Press <ENTER> to continue with default language config
One thing I noticed though is the delay is even worse in helix. For example, holding down a key for a while prints out the characters after a much longer delay than neovim. With helix, however, there's no choppiness; it's smooth but takes longer. With neovim, the delay is shorter, but very choppy.
That's the wrong parser, though: we use https://github.com/MDeiml/tree-sitter-markdown
The question is how to get helix to install the parser? I have no idea how helix works. But as I said, the delay is much worse in helix with the same file already, so I don't think there's a point checking it further.
This is what happens when you press down a key with high repetition rate in a markdown file
as you can see the compute time seems to be indeed be spent in parser_parse
(big chunk of 35ms is the markdown file, smaller chunks are injected languages). I'll check whether setting a timeout for the parser changes anything or whether it is really the parsing (or rather the querying causing problems here)
@theHamsta, maybe you can shed some light into why parsing a markdown file is more intensive than Lua, for instance, given that Lua's grammar is more complex than markdown's.
I also use treesitter for verilog, and there is a noticeable delay when opening a verilog file for the first time, even when the file has a few lines.
The verilog parser is very complex, it takes long to generate and the resulting parser is enormous. I would not be surprised when it's slow to parse. E.g. Lua stays always below the 2ms threshold that I've set on the build I'm experimenting with right now. Markdown is very complex to parse because it has no strict grammar and requires an external parser to keep track of all the stack of all the nested pairs.
I had a long time the plan to use tree-sitter's timeout feature for parsing to guarantee that we don't stay in the parse+query cycle to long (and maybe do off-thread background parsing with the intermediate result in case of timeout). With tree-sitter time out set I can type without lags (but of course I have not answered whether long parsing also implies longer querying afterwards
If you want, you can experiment with https://github.com/theHamsta/neovim/tree/nvtx
You can set the parsing timeout here: https://github.com/theHamsta/neovim/blob/7d313d9395befb743aae2309633b78e160db8c68/src/nvim/lua/treesitter.c#L337
It will stop parsing (and also highlighting) when parsing takes to long. You will loose highlights from time to time, but typing is fast :smile: . Also solves the problem people have when opening files that are multiple MB large
Let's see why markdown is slower.
Some findings:
parser_parse
of markdown (which invokes the parsing) which takes the major part of the range (this example repeats over all the timeline)
@theHamsta, thanks for doing these tests. You mentioned parsing timeout
, and from what I understood, it's something that is not active by default. I wonder then what causes treesitter to go mad sometimes when I start scrolling.
Here's a screenshot of my some file being highlighted correctly.
Here's the same section when I start scrolling.
This doesn't always happen, so its not easy to reproduced. It happens from time to time and my current solution is to restart neovim. Can you say what might be causing this issue?
This seems to be a different issue. I've also seen it when experimenting the timeout. Some extmarks got out of sync but is some thing Neovim does wrong, not tree-sitter taking a long time parsing.
@medwatt by default tree-sitter has unlimited amount of time to finish its parsing, to enable a timeout you have to edit the source code of Neovim. I could become a feature in future to protect ourselves from slow parsers or very big files.
A flamegraph of one session where I pressed the same key within two different link regions (it goes really slow at the second)
It seems to spend some time in ts_stack_pop_count
(is the stack very deep for that language?)
to reproduce add a few d
s to the first paragraph in the README within the link
@MDeiml any ideas?
@theHamsta, I think, for a temporary solution, it would be a good idea to expose the option to set a custom timeout for slow parsers.
This is a really weird bug. I don't think it's a problem with my parser, since it does not appear when just parsing the document as a whole. It appears to only happen with incremental parsing. Also I noticed that if I stop holding down d (letting my computer catch up) and then start again it does not slow down again.
I also don't think it's a problem with tree-sitter, because then the surrounding document should not have any influence on parsing speed (only the stack at the current position). But if I delete all the other paragraphs I don't get any slowdown.
It's also not a problem with language injections (if I remove the injection queries, I still get the same effect).
Rather it's probably something to do with highlighting (after disabling tree-sitter highlighting the problem disappears).
But weirdly, if I disable all markdown highlight queries the problem still appears.
If I had to guess I would say it's a problem with neovim, but that's just a hunch.
@MDeiml no, it is not Neovim or the highlighting. It is tree-sitter doing the parsing. It almost consumes all time with queries and injection negligible
parser_parse is just invoking ts_parse
the small ranges are Scanner::scan
(so I suppose when it's doing ts_parser_parse
). I suppose when Scanner::scan stops it is doing ts_tree_get_changed_ranges
(will verify that in minute)
It is not your external parser (it is only active 6% of the traced ranges)
but sure Neovim could handle this in a better way
Now with ts_parser_parse
traced
I think how Atom is handling this is that it let's the parsing timeout while doing parsing in a background thread that can be canceled by the foreground thread as soon as the foreground thread wants to parse again. I think every call to ts_parser_parse
does progress even when it times out.
Still, the problem does not appear when highlighting is disabled, and does appear when highlighting is enabled. (For both tests I left TSPlayground open to verify that the document does actually get parsed).
Within ts_parse
neovim passes a callback for reading new data:
https://github.com/neovim/neovim/blob/9005ffbe7757eca8ad809c81db76aec930db8e68/src/nvim/lua/treesitter.c#L292-L323
Could this be the culprit?
The input_cb
only makes a small . Without highlighting the tree doesn't get updated on every key stroke. The playground wasn't working for a long time without the triggers by the highlighter. I think that we now at least parse the tree once.
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Style Range
-------- --------------- --------- ------------ ------------ --------- ----------- ------------ ------- ---------------------------
32,0 18.276.135.814 292 62.589.506,0 65.104.053,0 917 249.261.151 19.474.199,0 PushPop LanguageTree:parse markdown
28,0 16.266.869.258 7.150 2.275.086,0 27.403,0 3.295 69.144.939 11.178.170,0 PushPop parser_parse
25,0 14.375.366.952 7.150 2.010.540,0 25.540,0 2.410 62.835.704 9.868.962,0 PushPop ts_parser_parse
4,0 2.699.118.363 7.714.968 349,0 331,0 121 364.863 516,0 PushPop markdown scan
3,0 1.866.829.630 7.150 261.095,0 245,0 116 7.792.537 1.310.739,0 PushPop ts_tree_get_changed_ranges
1,0 1.059.199.678 275 3.851.635,0 3.121.339,0 2.742.390 99.952.165 6.289.228,0 PushPop LanguageTree:parse lua
1,0 867.959.021 275 3.156.214,0 3.002.615,0 2.636.921 30.037.728 1.651.085,0 PushPop _get_injections markdown
0,0 404.102.225 1.448 279.076,0 50.607,0 6.101 182.794.010 4.880.470,0 PushPop on_line
0,0 323.440.617 275 1.176.147,0 579.766,0 404.118 67.359.675 4.486.126,0 PushPop _get_injections lua
0,0 205.169.011 523.333 392,0 338,0 126 128.838 483,0 PushPop input_cb
0,0 189.857.909 275 690.392,0 323.011,0 257.144 79.581.512 4.807.582,0 PushPop LanguageTree:parse vim
0,0 135.103.159 275 491.284,0 411.432,0 329.545 9.388.716 603.138,0 PushPop LanguageTree:parse html
0,0 122.241.394 275 444.514,0 122.033,0 80.412 73.084.122 4.411.825,0 PushPop _get_injections vim
0,0 60.690.523 275 220.692,0 160.789,0 110.730 6.923.854 417.566,0 PushPop _get_injections html
0,0 58.474.838 13 4.498.064,0 2.416.024,0 21.448 15.370.281 5.726.295,0 PushPop tslua_parse_query
input_cb
doesn't spend a lot of time, but I suppose that you think it causes the parsing to take longer than necessary based on the output it's producing. I saw that input_cb
(and also read system calls get called more often in injections).
@theHamsta Could you maybe record a flamegraph of inserting some ds, letting neovim catch up with work, and then inserting some more? For me it doesn't hang the second time, so there should be some difference. (Sry for bothering, but you seem to have a nice profiling setup :))
My profiling setup is not so great at the moment Intel Vtune is crashing whenever it tries to finalize the result which would be the best to filter certain time periods out of perf traces (will try on a different machine). I couldn't see anything fundamentally different in the instances when it takes longer which for me seems to depend mostly on document position.
@maxbrunsfield do you have any advice on how to debug this? We have the problem that for https://github.com/MDeiml/tree-sitter-markdown incremental parsing takes can take a long time 40ms-60ms see (cold start parsing takes 30ms) which causes a lag in the editor as keys can be fed at a faster rate. The edits are in each case single letters by just pressing one key in the README of out repo. Since the largest fraction of the time is spent in ts_parser_parse
(see https://github.com/nvim-treesitter/nvim-treesitter/issues/2916#issuecomment-1120287849 for timeline) it should be also reproducible using Atom (when no timeout is set for parsing). At the moment Neovim parses fully synchronously without any timeout for the parser set, also every keystroke triggers a parsing event
Maybe it would be good to reproduce this programmatically using the tree-sitter rust API: parsing the text once and then do edits to understand what's going on (profiling or with debugger attached)
I came across this issue after asking a query on /r/neovim.
It's the same issue described by @medwatt in the first comment. I've enabled filetype.lua
using g.do_filetype_lua = 1
and disabled filetype.vim
using g.did_load_filetypes = 0
. I'm using the markdown treesitter parser by @MDeiml. Here's the --startuptime
log file when a markdown file is opened
--startuptime
log file
times in msec
clock self+sourced self: sourced script
clock elapsed: other lines
000.023 000.023: --- NVIM STARTING ---
000.553 000.530: locale set
001.155 000.602: inits 1
001.192 000.037: window checked
001.543 000.351: parsing arguments
005.880 004.337: expanding arguments
005.923 000.043: inits 2
006.889 000.966: init highlight
006.894 000.005: waiting for UI
009.039 002.145: done waiting for UI
009.093 000.054: init screen for UI
009.134 000.041: init default mappings
009.202 000.068: init default autocommands
012.028 000.224 000.224: sourcing /usr/share/nvim/runtime/ftplugin.vim
012.525 000.122 000.122: sourcing /usr/share/nvim/runtime/indent.vim
012.779 000.052 000.052: sourcing /usr/share/nvim/archlinux.vim
012.797 000.157 000.105: sourcing /etc/xdg/nvim/sysinit.vim
026.088 013.177 013.177: sourcing /home/user/.config/nvim/init.lua
026.125 003.244: sourcing vimrc file(s)
026.926 000.044 000.044: sourcing /home/user/.local/share/nvim/site/pack/packer/start/LuaSnip/ftdetect/snippets.vim
027.294 000.039 000.039: sourcing /usr/share/vim/vimfiles/ftdetect/PKGBUILD.vim
027.397 000.059 000.059: sourcing /usr/share/vim/vimfiles/ftdetect/meson.vim
027.489 000.048 000.048: sourcing /usr/share/vim/vimfiles/ftdetect/vagrantfile.vim
027.854 001.363 001.173: sourcing /usr/share/nvim/runtime/filetype.lua
027.968 000.048 000.048: sourcing /usr/share/nvim/runtime/filetype.vim
028.539 000.220 000.220: sourcing /usr/share/nvim/runtime/syntax/synload.vim
028.826 000.756 000.537: sourcing /usr/share/nvim/runtime/syntax/syntax.vim
030.913 000.047 000.047: sourcing /usr/share/nvim/runtime/plugin/gzip.vim
030.999 000.037 000.037: sourcing /usr/share/nvim/runtime/plugin/health.vim
031.138 000.091 000.091: sourcing /usr/share/nvim/runtime/plugin/man.vim
031.229 000.039 000.039: sourcing /usr/share/nvim/runtime/plugin/matchit.vim
031.600 000.325 000.325: sourcing /usr/share/nvim/runtime/plugin/matchparen.vim
031.701 000.049 000.049: sourcing /usr/share/nvim/runtime/plugin/netrwPlugin.vim
032.075 000.037 000.037: sourcing /home/user/.local/share/nvim/rplugin.vim
032.094 000.349 000.312: sourcing /usr/share/nvim/runtime/plugin/rplugin.vim
032.293 000.152 000.152: sourcing /usr/share/nvim/runtime/plugin/shada.vim
032.387 000.037 000.037: sourcing /usr/share/nvim/runtime/plugin/spellfile.vim
032.483 000.047 000.047: sourcing /usr/share/nvim/runtime/plugin/tarPlugin.vim
032.568 000.037 000.037: sourcing /usr/share/nvim/runtime/plugin/tohtml.vim
032.667 000.051 000.051: sourcing /usr/share/nvim/runtime/plugin/tutor.vim
032.766 000.048 000.048: sourcing /usr/share/nvim/runtime/plugin/zipPlugin.vim
033.070 000.050 000.050: sourcing /usr/share/vim/vimfiles/plugin/fzf.vim
033.270 000.147 000.147: sourcing /usr/share/vim/vimfiles/plugin/redact_pass.vim
063.140 010.130 010.130: sourcing /home/user/.local/share/nvim/site/pack/packer/start/onedark.nvim/colors/onedark.lua
108.306 074.803 064.673: sourcing /home/user/.config/nvim/plugin/packer_compiled.lua
109.021 004.420: loading rtp plugins
109.678 000.245 000.245: sourcing /home/user/.local/share/nvim/site/pack/packer/start/LuaSnip/plugin/luasnip.vim
110.296 000.385 000.385: sourcing /home/user/.local/share/nvim/site/pack/packer/start/indent-blankline.nvim/plugin/indent_blankline.vim
111.682 001.048 001.048: sourcing /home/user/.local/share/nvim/site/pack/packer/start/nvim-treesitter/plugin/nvim-treesitter.lua
112.090 000.168 000.168: sourcing /home/user/.local/share/nvim/site/pack/packer/start/vim-cool/plugin/cool.vim
112.269 001.401: loading packages
112.641 000.231 000.231: sourcing /home/user/.local/share/nvim/site/pack/packer/start/Comment.nvim/after/plugin/Comment.lua
112.650 000.151: loading after plugins
112.663 000.012: inits 3
117.340 004.678: reading ShaDa
124.668 000.452 000.452: sourcing /usr/share/nvim/runtime/autoload/htmlcomplete.vim
124.819 000.754 000.302: sourcing /usr/share/nvim/runtime/ftplugin/html.vim
125.160 001.425 000.671: sourcing /usr/share/nvim/runtime/ftplugin/markdown.vim
127.838 000.333 000.333: sourcing /usr/share/nvim/runtime/syntax/javascript.vim
130.206 002.177 002.177: sourcing /usr/share/nvim/runtime/syntax/vb.vim
136.619 006.283 006.283: sourcing /usr/share/nvim/runtime/syntax/css.vim
137.933 011.319 002.527: sourcing /usr/share/nvim/runtime/syntax/html.vim
138.344 011.844 000.524: sourcing /usr/share/nvim/runtime/syntax/markdown.vim
204.324 073.714: opening buffers
205.968 001.644: BufEnter autocommands
205.977 000.009: editing files in windows
206.806 000.829: VimEnter autocommands
206.814 000.008: UIEnter autocommands
207.221 000.287 000.287: sourcing /usr/share/nvim/runtime/autoload/provider/clipboard.vim
207.231 000.131: before starting main loop
271.764 064.532: first screen update
271.772 000.008: --- NVIM STARTED ---
Whenever I edit a markdown file with more than 300 or 500 lines with some code blocks, the input latency increases dramatically. When it's more than 1000 lines, I have to wait for almost a second for a keypress to show up on my screen. If I delete characters, the cursor disappears and text is deleted with a delay of almost a second.
I'm not sure how to disable syntax highlighting for fenced code blocks when using the markdown treesitter parser or if it'll help. If I disable the markdown treesitter parser, there's a significant improvement in input latency.
I've noticed from the startuptime logs that vimscript runtime syntax files are sourced for code blocks, including markdown.vim
, even though I've installed treesitter parsers for all the languages mentioned in the log and I've also disabled vim regex syntax highlighting in my neovim config.
I'm not sure how to disable syntax highlighting for fenced code blocks when using the markdown treesitter parser or if it'll help. If I disable the markdown treesitter parser, there's a significant improvement in input latency.
Remove the injections.scm
from your runtime path.
I've noticed from the startuptime logs that vimscript runtime syntax files are sourced for code blocks, including markdown.vim, even though I've installed treesitter parsers for all the languages mentioned in the log and I've also disabled vim regex syntax highlighting in my neovim config.
Are you sure they're actually executed? They will show up even if they're skipped by finish
ing early (which is the usual mechanism for Vim to "skip" files).
Remove the
injections.scm
from your runtime path.
I moved the injections.scm
file out of my runtime path and markdown files still highlight the fenced code blocks and have the same input latency as mentioned before.
I confirmed that the injections.scm
file was not in my runtime path using
:lua print(vim.inspect(vim.api.nvim_get_runtime_file('queries/lua/*', true)))
Are you sure they're actually executed?
Sorry, I'm not. I assumed they were since they were adding non-negligible time in the startuptime log.
:lua print(vim.inspect(vim.api.nvim_get_runtime_file('queries/lua/*', true)))
that's the wrong one, though -- you want the queries/markdown/injections.scm
.
that's the wrong one, though -- you want the
queries/markdown/injections.scm
.
Ah, that helps, thanks!
The input latency is almost back to normal. If I delete characters using backspace, the cursor still disappears though. I've confirmed that this doesn't happen in other types of files.
The syntax highlighting for markdown also gets messed up in some regions but that's probably just markdown quirks though.
Yeah, markdown is just hard to parse into a syntax tree. People are working on that, but it is highly non-trivial.
@clason that's okay, thanks for your help
This is an unrelated question but can you point me to a markup language for writing documents that is well supported in treesitter and doesn't have performance issues in neovim if the document is more than 1000 or 2000 lines long?
I'm considering writing my documents in another such markup language and then converting it back to markdown using pandoc before I push them to a git repo.
Not to my knowledge; this is a fundamental limitation common to all "soft" markup languages (opposed to structured ones like HTML or LaTeX).
You could give RST a try, though.
That's disappointing. It reminds me of this post on undeadly.org about markdown.
I'm not sure if neorg and its treesitter parser can handle large documents without introducing input latency in the terminal. If not, I'll probably switch to writing articles in HTML.
Thanks!
@ayushnix I'm sure the problems with markdown input latency can be solved by a time out for tree-sitter parsing. It was working smoothly when I added the time out (except that highlighting was lost sometimes due to the fact that there is not code), possibly switching to background parsing or to reusing the previous parsing result. We're talking about max 42ms during incremental parsing which is slow enough to build up a latency lag when you type multiple letters at once, but still manageable as an editor to provide the highlighting. In other words: it's to slow for "on every keystroke", but fast enough to catch up once it moved to background parse once it reached the timeout. The problem we're experiencing here is that after a fast input of 5 letters, we experience 5 times the parsing latency while with a timeout it would be possible to cancel the first 4 letters and finish at the last letter with a background thread. Usually, the 5 times incremental parsing should go really fast as the parser state should have changed much. But even when that does not work the editor should harness itself against excessive parsing times.
There is not fundamental limitation why Markdown parser should be slow. It's just that nested pairs of delimiters are difficult to express with tree-sitter and almost always require an external parser that can count the nesting state. You can test whether https://github.com/ikatyang/tree-sitter-markdown has the same limitation. It's also possible that the parser of @MDeiml has some properties that make the incremental parsing logic fail to build efficiently on the previous result.
@ayushnix can you provide some evidence that the injections have any effect at all? With https://github.com/neovim/neovim/pull/18761 you can visualize what fraction of the latency is cause by markdown parsing and what by the injections. In the document I tested I was experiencing latency purely by the markdown parser with injection causing only a negligible fraction of the whole incremental parsing
I'm actually experimenting at the moment on if I can get this faster. This would include optionally only parsing inline that are visible (parsing inlines only depends on all the inlines in the same block and not other blocks) and a few changes around paragraphs, which are kinda important and really slow at the moment. But if I can get something faster to work it's gonna take a while since it probably needs some features in upstream neovim.
But I'm quite confident that I should be able to get this at least somewhat fast since parsing the block structure could probably be done pretty fast since it's well definer, it's mainly inlines like links and emphasis that make the parser slow. I should be able to split the two .
I think I found the cause of this issue. I think tree-sitter has problems with reusing trees that are very "flat", i.e. trees where most nodes have a lot of siblings. This is not a problem with usual programming languages, since they're often structured hierarchically, but with markdown (as it is now) most nodes are children of the root node.
When parsing a file after introducing some edits, all siblings of nodes that changed are also parsed again. I'm not sure why, maybe @maxbrunsfeld could give some insights?
I was able to solve this by just introducing more hierarchy artificially. More concretely I added a section
node, which starts with a heading and stretches until the next heading. With this I can get syntax highlighting in a ~3000 line file without any noticeable delay.
I might try later to get a quick fix in this way for the current version, but as I said I'm currently working on rewriting the parser so I'd rather work on that.
A quick fix would be very much appreciated, since the rewrite sounds like something we can't just drop in in place of the current one (needing substantial infrastructure work to support such "split parsers").
Of course, I understand that this is much less interesting work ;)
A quickfix would probably to let nvim timeout long parsings. We will always face the situation that parsing is when the file is too big (at least for initial parsing). Although a change in Neovim might not be not that quick.
We'll never know until someone puts a PR for it up for discussion...
I tried to implement the fix on the main branch, but I didn't get the same speedup. Not sure why.
We'll never know until someone puts a PR for it up for discussion...
well, I guess the how in the implementation is the thing that's taking some time... Maybe I'll find some time tomorrow for it. There are quite many possibilities to deal with this and neither me knows what is the best one until I tried them out.
~I noticed something else while writing rust bindings for my parser. If I use a single tree-sitter parser object and ts_parser_set_language
then parsing again after edits seems to happen almost instantly. If I use one parser for each language then parsing after edits takes equally as long as the first parse.~
~I don't know if this is specific to my use case, but maybe it would make sense to investigate something similar for neovim, as it seems it also uses on parser per language.~
Nvm there was a hidden error and I was getting garbage data.
https://github.com/tree-sitter/tree-sitter-haskell/issues/41#issuecomment-1004310271
It seems that my previous comment about hierarchical structure was the right hunch. Reducing conflicts should be the main priority for slow parsers, but "sectioning off" the conflicts seems to work as well. Unfortunately neither is possible for inline markdown elements like emphasis.
I think I found the cause of this issue. I think tree-sitter has problems with reusing trees that are very "flat", i.e. trees where most nodes have a lot of siblings. This is not a problem with usual programming languages, since they're often structured hierarchically, but with markdown (as it is now) most nodes are children of the root node.
I think it must be something more specific than that; otherwise it would reproduce in, for example, a large C file with hundreds of small functions, since those functions would all be sibling nodes.
I'm curious what's going on, and I'll try to reproduce the slowness with the tree-sitter
CLI, using the parse --edit
command.
Ok, I can reproduce the problem from the command line. I believe the problem is a certain conflict in your grammar, between _soft_line_break
and _paragraph_end_newline
. It causes every paragraph to be considered "fragile", and not re-usable.
I determined this by creating a small markdown file, test.md
with five two-word paragraphs:
a b
c d
e f
g h
i j
I then parsed this file from the command line with debug graphs enabled:
tree-sitter parse test.md -D
(document [0, 0] - [10, 0]
(paragraph [0, 0] - [1, 0])
(paragraph [2, 0] - [3, 0])
(paragraph [4, 0] - [5, 0])
(paragraph [6, 0] - [7, 0])
(paragraph [8, 0] - [9, 0]))
This creates a long sequence of SVG graphs. In this graph, you can zoom in on a particular point, when the parser reaches the end of a paragraph, and see that the parse stack splits into two branches:
Any node that is created in an ambiguous state like this is considered fragile - it cannot be reused during incremental parsing if any of its contents have changed. In this case, the ambiguity is still in effect while the paragraph
and block
nodes are created.
To observe the performance impact of this ☝️ more directly, you can perform an edit and an incremental re-parse at the command line, inserting a character on line 4 (the third paragraph).
tree-sitter parse test.md --edit '4,1 0 1'
It re-parses correctly, but if you run with -d
(for terminal logging) or -D (to generate another SVG log), you can see that the parser decides not to reuse any block/paragraph nodes.
...
cant_reuse_node_is_fragile tree:_block
cant_reuse_node_is_fragile tree:paragraph
...
@MDeiml Can you think of a way to not have this conflict with _paragraph_end_newline
? Can you tell the difference between a paragraph ending and a "soft" line break by the number of newlines?
Thank you! I have a fix for this conflict in paragraphs where I parse ahead quite a bit to determine if a newline is a soft line break. This means that a lot paragraphs can now be reused.
But a similar problem now appears with emphasis, which appear in a lot of paragraphs as top level inline nodes. I'm not sure it's possible to parse those without conflicts as that would require potentially infinite lookahead. But maybe it's possible to create a "fast path" for the most common use case of no nested inlines.
I think it's probably ok for emphasis to have that conflict, since most (all?) top-level nodes in the document are not emphasis nodes.
That's true, but pretty much every top level node has children that cannot be reused, which means that parsing is still slow in very very large documents, though I can get it to very acceptable speeds for e.g. the README for this repo.
I have a question though, shouldn't it be possible to reuse fragile trees (whole trees not nodes) if all edits were outside their range set with ts_parser_set_included_range
? I have to admit I don't really understand this concept of fragility so I might be wrong, but even with conflicts parsing should be deterministic.
I am currently working on a version of this parser where inline elements (emphasis) and block elements (paragraphs) are split into two grammars. This means that every inline range is parsed separately. I noticed that almost all of the inline ranges are not reused, which makes sense as most contain emphasis and are thus fragile. But all that needs to be done is to shift the node positions, so I'd be keen to just not reparse the unaffected trees.
Describe the bug
I noticed recently when editing a large markdown file that has more 1000 lines that the delay between keystrokes while typing becomes perceptible. This only happens when treesitter for markdown is enabled.
Here's a video demonstrating the difference in the typing experience between an empty file and a large file.
https://user-images.githubusercontent.com/17733465/167254657-808a0f73-219f-4531-b878-d4ea5d06c4d7.mp4
To Reproduce
Expected behavior
It is expected that there should be no lag when typing irrespective of the number of lines in the file.
Output of
:checkhealth nvim-treesitter
Output of
nvim --version
Additional context
No response