Markdown files with more than 500 lines become perceptibly slow

medwatt commented 2 years ago

Describe the bug

I noticed recently when editing a large markdown file that has more 1000 lines that the delay between keystrokes while typing becomes perceptible. This only happens when treesitter for markdown is enabled.

Here's a video demonstrating the difference in the typing experience between an empty file and a large file.

https://user-images.githubusercontent.com/17733465/167254657-808a0f73-219f-4531-b878-d4ea5d06c4d7.mp4

To Reproduce

Create a large markdown file with many code blocks (how large the file is would probably depend on your computer)
Start typing while inside a code block

Expected behavior

It is expected that there should be no lag when typing irrespective of the number of lines in the file.

Output of `:checkhealth nvim-treesitter`

nvim-treesitter: require("nvim-treesitter.health").check()
========================================================================
## Installation
  - OK: `tree-sitter` found 0.20.6 (parser generator, only needed for :TSInstallFromGrammar)
  - OK: `node` found v17.8.0 (only needed for :TSInstallFromGrammar)
  - OK: `git` executable found.
  - OK: `cc` executable found. Selected from { vim.NIL, "cc", "gcc", "clang", "cl", "zig" }
    Version: cc (GCC) 11.2.0
  - OK: Neovim was compiled with tree-sitter runtime ABI version 14 (required >=13). Parsers must be compatible with runtime ABI.

## Parser/Features H L F I J
  - hack           ✓ . . . . 
  - norg           . . . . . 
  - svelte         ✓ . ✓ ✓ ✓ 
  - astro          ✓ ✓ ✓ ✓ ✓ 
  - bash           ✓ ✓ ✓ . ✓ 
  - beancount      ✓ . ✓ . . 
  - lalrpop        ✓ ✓ . . . 
  - php            ✓ ✓ ✓ ✓ ✓ 
  - swift          ✓ ✓ . . . 
  - markdown       ✓ . ✓ . ✓ 
  - cooklang       ✓ . . . . 
  - tlaplus        ✓ ✓ ✓ . ✓ 
  - fish           ✓ ✓ ✓ ✓ ✓ 
  - toml           ✓ ✓ ✓ ✓ ✓ 
  - wgsl           ✓ . ✓ . . 
  - proto          ✓ . ✓ . . 
  - m68k           ✓ ✓ ✓ . ✓ 
  - pug            ✓ . . . ✓ 
  - elvish         ✓ . . . ✓ 
  - solidity       ✓ . . . . 
  - glsl           ✓ ✓ ✓ ✓ ✓ 
  - tsx            ✓ ✓ ✓ ✓ ✓ 
  - regex          ✓ . . . . 
  - turtle         ✓ ✓ ✓ ✓ ✓ 
  - go             ✓ ✓ ✓ ✓ ✓ 
  - html           ✓ ✓ ✓ ✓ ✓ 
  - yang           ✓ . ✓ ✓ . 
  - graphql        ✓ . . ✓ ✓ 
  - d              ✓ . ✓ ✓ ✓ 
  - vue            ✓ . ✓ ✓ ✓ 
  - r              ✓ ✓ . ✓ ✓ 
  - make           ✓ . . . ✓ 
  - http           ✓ . . . ✓ 
  - prisma         ✓ . . . . 
  - query          ✓ ✓ ✓ ✓ ✓ 
  - java           ✓ ✓ . ✓ ✓ 
  - cmake          ✓ . ✓ . . 
  - llvm           ✓ . . . . 
  - ruby           ✓ ✓ ✓ ✓ ✓ 
  - css            ✓ . ✓ ✓ ✓ 
  - perl           ✓ . ✓ . . 
  - pioasm         ✓ . . . ✓ 
  - json5          ✓ . . . ✓ 
  - julia          ✓ ✓ ✓ ✓ ✓ 
  - pascal         ✓ ✓ ✓ ✓ ✓ 
  - vim            ✓ ✓ . . ✓ 
  - json           ✓ ✓ ✓ ✓ . 
  - cpp            ✓ ✓ ✓ ✓ ✓ 
  - slint          ✓ . . ✓ . 
  - zig            ✓ . ✓ ✓ ✓ 
  - bibtex         ✓ . ✓ ✓ . 
  - gowork         ✓ . . . ✓ 
  - yaml           ✓ ✓ ✓ ✓ ✓ 
  - jsdoc          ✓ . . . . 
  - hcl            ✓ . ✓ ✓ ✓ 
  - heex           ✓ ✓ ✓ ✓ ✓ 
  - glimmer        ✓ . . . . 
  - sparql         ✓ ✓ ✓ ✓ ✓ 
  - dot            ✓ . . . ✓ 
  - latex          ✓ . ✓ . ✓ 
  - gdscript       ✓ ✓ . ✓ ✓ 
  - devicetree     ✓ ✓ ✓ ✓ ✓ 
  - lua            ✓ ✓ ✓ ✓ ✓ 
  - foam           ✓ ✓ ✓ ✓ ✓ 
  - godot_resource ✓ ✓ ✓ . . 
  - scheme         ✓ . ✓ . ✓ 
  - clojure        ✓ ✓ ✓ . ✓ 
  - gomod          ✓ . . . ✓ 
  - comment        ✓ . . . . 
  - elixir         ✓ ✓ ✓ ✓ ✓ 
  - phpdoc         ✓ . . . . 
  - erlang         . . . . . 
  - verilog        ✓ ✓ ✓ . ✓ 
  - rego           ✓ . . . ✓ 
  - dockerfile     ✓ . . . ✓ 
  - fortran        ✓ . ✓ ✓ . 
  - jsonc          ✓ ✓ ✓ ✓ ✓ 
  - haskell        ✓ . . . ✓ 
  - embedded_template✓ . . . ✓ 
  - javascript     ✓ ✓ ✓ ✓ ✓ 
  - fennel         ✓ ✓ . . ✓ 
  - gleam          ✓ ✓ ✓ ✓ ✓ 
  - commonlisp     ✓ ✓ ✓ . . 
  - kotlin         ✓ ✓ ✓ . ✓ 
  - rst            ✓ ✓ . . ✓ 
  - dart           ✓ ✓ . ✓ ✓ 
  - ocaml          ✓ ✓ ✓ . ✓ 
  - cuda           ✓ ✓ ✓ ✓ ✓ 
  - nix            ✓ ✓ ✓ . ✓ 
  - ninja          ✓ . ✓ ✓ . 
  - help           ✓ . . . . 
  - ocaml_interface✓ ✓ ✓ . ✓ 
  - rust           ✓ ✓ ✓ ✓ ✓ 
  - org            . . . . . 
  - ocamllex       ✓ . . . ✓ 
  - typescript     ✓ ✓ ✓ ✓ ✓ 
  - ql             ✓ ✓ . ✓ ✓ 
  - hjson          ✓ ✓ ✓ ✓ ✓ 
  - scala          ✓ . ✓ . ✓ 
  - fusion         ✓ ✓ ✓ ✓ . 
  - hocon          ✓ . . . ✓ 
  - scss           ✓ . . ✓ . 
  - todotxt        ✓ . . . . 
  - eex            ✓ . . . ✓ 
  - c              ✓ ✓ ✓ ✓ ✓ 
  - python         ✓ ✓ ✓ ✓ ✓ 
  - ledger         ✓ . ✓ ✓ ✓ 
  - vala           ✓ . . . . 
  - surface        ✓ . ✓ ✓ ✓ 
  - elm            ✓ . . . ✓ 
  - supercollider  ✓ ✓ ✓ ✓ ✓ 
  - rasi           ✓ ✓ ✓ ✓ . 
  - c_sharp        ✓ ✓ ✓ . ✓ 
  - teal           ✓ ✓ ✓ ✓ ✓ 

  Legend: H[ighlight], L[ocals], F[olds], I[ndents], In[j]ections
         +) multiple parsers found, only one will be used
         x) errors found in the query, try to run :TSUpdate {lang}

Output of `nvim --version`

NVIM v0.7.0
Build type: Release
LuaJIT 2.1.0-beta3
Compiled by builduser

Features: +acl +iconv +tui
See ":help feature-compile"

   system vimrc file: "$VIM/sysinit.vim"
  fall-back for $VIM: "/usr/share/nvim"

Run :checkhealth for more info

Additional context

No response

theHamsta commented 2 years ago

You're editing source code in the markdown. We know that language injections are not implemented in the most efficient way (no incremental parsing). Can you identify whether the markdown parser or the language injection is the problem?

medwatt commented 2 years ago

@theHamsta, I created a new file with just 500 lines or so, and without any code blocks. Here's the result. So, I believe the markdown parser is likely the one causing the slowdown.

https://user-images.githubusercontent.com/17733465/167255484-ab98e6fa-489b-4d7f-a6c0-4dc5a95e57ff.mp4

theHamsta commented 2 years ago

@medwatt the markdown parser can be used also from other editors do you have the time to check whether helix or the tree-sitter web playground have the same problem. I will try to get the timing for parsing and querying from Neovim to see where we have the problem.

medwatt commented 2 years ago

@theHamsta, this is the first time I am hearing of these. However, I installed the helix editor to test. I am having problems installing the markdown parser. According to the documentation, all I need to do is put the following in the languages.toml file in the config folder:

[[grammar]]
name = "markdown"
source = { git = "https://github.com/ikatyang/tree-sitter-markdown" }

Launching helix gives the following:

Bad language config: unknown field `grammar`, expected `language`
Press <ENTER> to continue with default language config

One thing I noticed though is the delay is even worse in helix. For example, holding down a key for a while prints out the characters after a much longer delay than neovim. With helix, however, there's no choppiness; it's smooth but takes longer. With neovim, the delay is shorter, but very choppy.

clason commented 2 years ago

That's the wrong parser, though: we use https://github.com/MDeiml/tree-sitter-markdown

medwatt commented 2 years ago

https://github.com/MDeiml/tree-sitter-markdown

The question is how to get helix to install the parser? I have no idea how helix works. But as I said, the delay is much worse in helix with the same file already, so I don't think there's a point checking it further.

theHamsta commented 2 years ago

This is what happens when you press down a key with high repetition rate in a markdown file grafik

as you can see the compute time seems to be indeed be spent in parser_parse (big chunk of 35ms is the markdown file, smaller chunks are injected languages). I'll check whether setting a timeout for the parser changes anything or whether it is really the parsing (or rather the querying causing problems here)

medwatt commented 2 years ago

@theHamsta, maybe you can shed some light into why parsing a markdown file is more intensive than Lua, for instance, given that Lua's grammar is more complex than markdown's.

I also use treesitter for verilog, and there is a noticeable delay when opening a verilog file for the first time, even when the file has a few lines.

theHamsta commented 2 years ago

The verilog parser is very complex, it takes long to generate and the resulting parser is enormous. I would not be surprised when it's slow to parse. E.g. Lua stays always below the 2ms threshold that I've set on the build I'm experimenting with right now. Markdown is very complex to parse because it has no strict grammar and requires an external parser to keep track of all the stack of all the nested pairs.

theHamsta commented 2 years ago

I had a long time the plan to use tree-sitter's timeout feature for parsing to guarantee that we don't stay in the parse+query cycle to long (and maybe do off-thread background parsing with the intermediate result in case of timeout). With tree-sitter time out set I can type without lags (but of course I have not answered whether long parsing also implies longer querying afterwards

theHamsta commented 2 years ago

If you want, you can experiment with https://github.com/theHamsta/neovim/tree/nvtx

You can set the parsing timeout here: https://github.com/theHamsta/neovim/blob/7d313d9395befb743aae2309633b78e160db8c68/src/nvim/lua/treesitter.c#L337

It will stop parsing (and also highlighting) when parsing takes to long. You will loose highlights from time to time, but typing is fast :smile: . Also solves the problem people have when opening files that are multiple MB large

Let's see why markdown is slower.

theHamsta commented 2 years ago

Some findings:

it depends a lot where in the document you insert text
it is indeed the C function parser_parse of markdown (which invokes the parsing) which takes the major part of the range (this example repeats over all the timeline)
nvtx is really cool when you visualize the timings of injected languages (I haven't marked querying yet, but it should be the empty spots)

medwatt commented 2 years ago

@theHamsta, thanks for doing these tests. You mentioned parsing timeout, and from what I understood, it's something that is not active by default. I wonder then what causes treesitter to go mad sometimes when I start scrolling.

Here's a screenshot of my some file being highlighted correctly.

Here's the same section when I start scrolling.

This doesn't always happen, so its not easy to reproduced. It happens from time to time and my current solution is to restart neovim. Can you say what might be causing this issue?

theHamsta commented 2 years ago

This seems to be a different issue. I've also seen it when experimenting the timeout. Some extmarks got out of sync but is some thing Neovim does wrong, not tree-sitter taking a long time parsing.

@medwatt by default tree-sitter has unlimited amount of time to finish its parsing, to enable a timeout you have to edit the source code of Neovim. I could become a feature in future to protect ourselves from slow parsers or very big files.

A flamegraph of one session where I pressed the same key within two different link regions (it goes really slow at the second)

It seems to spend some time in ts_stack_pop_count (is the stack very deep for that language?)

to reproduce add a few ds to the first paragraph in the README within the link grafik @MDeiml any ideas?

medwatt commented 2 years ago

@theHamsta, I think, for a temporary solution, it would be a good idea to expose the option to set a custom timeout for slow parsers.

MDeiml commented 2 years ago

This is a really weird bug. I don't think it's a problem with my parser, since it does not appear when just parsing the document as a whole. It appears to only happen with incremental parsing. Also I noticed that if I stop holding down d (letting my computer catch up) and then start again it does not slow down again.

I also don't think it's a problem with tree-sitter, because then the surrounding document should not have any influence on parsing speed (only the stack at the current position). But if I delete all the other paragraphs I don't get any slowdown.

It's also not a problem with language injections (if I remove the injection queries, I still get the same effect).

Rather it's probably something to do with highlighting (after disabling tree-sitter highlighting the problem disappears).

But weirdly, if I disable all markdown highlight queries the problem still appears.

If I had to guess I would say it's a problem with neovim, but that's just a hunch.

theHamsta commented 2 years ago

@MDeiml no, it is not Neovim or the highlighting. It is tree-sitter doing the parsing. It almost consumes all time with queries and injection negligible grafik

parser_parse is just invoking ts_parse the small ranges are Scanner::scan (so I suppose when it's doing ts_parser_parse). I suppose when Scanner::scan stops it is doing ts_tree_get_changed_ranges (will verify that in minute) grafik It is not your external parser (it is only active 6% of the traced ranges)

grafik

but sure Neovim could handle this in a better way

Now with ts_parser_parse traced grafik

theHamsta commented 2 years ago

I think how Atom is handling this is that it let's the parsing timeout while doing parsing in a background thread that can be canceled by the foreground thread as soon as the foreground thread wants to parse again. I think every call to ts_parser_parse does progress even when it times out.

MDeiml commented 2 years ago

Still, the problem does not appear when highlighting is disabled, and does appear when highlighting is enabled. (For both tests I left TSPlayground open to verify that the document does actually get parsed).

Within ts_parse neovim passes a callback for reading new data: https://github.com/neovim/neovim/blob/9005ffbe7757eca8ad809c81db76aec930db8e68/src/nvim/lua/treesitter.c#L292-L323

Could this be the culprit?

theHamsta commented 2 years ago

The input_cb only makes a small . Without highlighting the tree doesn't get updated on every key stroke. The playground wasn't working for a long time without the triggers by the highlighter. I think that we now at least parse the tree once.

Time (%)  Total Time (ns)  Instances    Avg (ns)      Med (ns)    Min (ns)    Max (ns)    StdDev (ns)    Style              Range           
 --------  ---------------  ---------  ------------  ------------  ---------  -----------  ------------  -------  ---------------------------
     32,0   18.276.135.814        292  62.589.506,0  65.104.053,0        917  249.261.151  19.474.199,0  PushPop  LanguageTree:parse markdown
     28,0   16.266.869.258      7.150   2.275.086,0      27.403,0      3.295   69.144.939  11.178.170,0  PushPop  parser_parse               
     25,0   14.375.366.952      7.150   2.010.540,0      25.540,0      2.410   62.835.704   9.868.962,0  PushPop  ts_parser_parse            
      4,0    2.699.118.363  7.714.968         349,0         331,0        121      364.863         516,0  PushPop  markdown scan              
      3,0    1.866.829.630      7.150     261.095,0         245,0        116    7.792.537   1.310.739,0  PushPop  ts_tree_get_changed_ranges 
      1,0    1.059.199.678        275   3.851.635,0   3.121.339,0  2.742.390   99.952.165   6.289.228,0  PushPop  LanguageTree:parse lua     
      1,0      867.959.021        275   3.156.214,0   3.002.615,0  2.636.921   30.037.728   1.651.085,0  PushPop  _get_injections markdown   
      0,0      404.102.225      1.448     279.076,0      50.607,0      6.101  182.794.010   4.880.470,0  PushPop  on_line                    
      0,0      323.440.617        275   1.176.147,0     579.766,0    404.118   67.359.675   4.486.126,0  PushPop  _get_injections lua        
      0,0      205.169.011    523.333         392,0         338,0        126      128.838         483,0  PushPop  input_cb                   
      0,0      189.857.909        275     690.392,0     323.011,0    257.144   79.581.512   4.807.582,0  PushPop  LanguageTree:parse vim     
      0,0      135.103.159        275     491.284,0     411.432,0    329.545    9.388.716     603.138,0  PushPop  LanguageTree:parse html    
      0,0      122.241.394        275     444.514,0     122.033,0     80.412   73.084.122   4.411.825,0  PushPop  _get_injections vim        
      0,0       60.690.523        275     220.692,0     160.789,0    110.730    6.923.854     417.566,0  PushPop  _get_injections html       
      0,0       58.474.838         13   4.498.064,0   2.416.024,0     21.448   15.370.281   5.726.295,0  PushPop  tslua_parse_query

input_cb doesn't spend a lot of time, but I suppose that you think it causes the parsing to take longer than necessary based on the output it's producing. I saw that input_cb (and also read system calls get called more often in injections).

MDeiml commented 2 years ago

@theHamsta Could you maybe record a flamegraph of inserting some ds, letting neovim catch up with work, and then inserting some more? For me it doesn't hang the second time, so there should be some difference. (Sry for bothering, but you seem to have a nice profiling setup :))

theHamsta commented 2 years ago

My profiling setup is not so great at the moment Intel Vtune is crashing whenever it tries to finalize the result which would be the best to filter certain time periods out of perf traces (will try on a different machine). I couldn't see anything fundamentally different in the instances when it takes longer which for me seems to depend mostly on document position.

theHamsta commented 2 years ago

@maxbrunsfield do you have any advice on how to debug this? We have the problem that for https://github.com/MDeiml/tree-sitter-markdown incremental parsing takes can take a long time 40ms-60ms see (cold start parsing takes 30ms) which causes a lag in the editor as keys can be fed at a faster rate. The edits are in each case single letters by just pressing one key in the README of out repo. Since the largest fraction of the time is spent in ts_parser_parse (see https://github.com/nvim-treesitter/nvim-treesitter/issues/2916#issuecomment-1120287849 for timeline) it should be also reproducible using Atom (when no timeout is set for parsing). At the moment Neovim parses fully synchronously without any timeout for the parser set, also every keystroke triggers a parsing event

theHamsta commented 2 years ago

Maybe it would be good to reproduce this programmatically using the tree-sitter rust API: parsing the text once and then do edits to understand what's going on (profiling or with debugger attached)

ayushnix commented 2 years ago

I came across this issue after asking a query on /r/neovim.

It's the same issue described by @medwatt in the first comment. I've enabled filetype.lua using g.do_filetype_lua = 1 and disabled filetype.vim using g.did_load_filetypes = 0. I'm using the markdown treesitter parser by @MDeiml. Here's the --startuptime log file when a markdown file is opened

--startuptime log file


 times in msec
 clock   self+sourced   self:  sourced script
 clock   elapsed:              other lines

000.023  000.023: --- NVIM STARTING ---
000.553  000.530: locale set
001.155  000.602: inits 1
001.192  000.037: window checked
001.543  000.351: parsing arguments
005.880  004.337: expanding arguments
005.923  000.043: inits 2
006.889  000.966: init highlight
006.894  000.005: waiting for UI
009.039  002.145: done waiting for UI
009.093  000.054: init screen for UI
009.134  000.041: init default mappings
009.202  000.068: init default autocommands
012.028  000.224  000.224: sourcing /usr/share/nvim/runtime/ftplugin.vim
012.525  000.122  000.122: sourcing /usr/share/nvim/runtime/indent.vim
012.779  000.052  000.052: sourcing /usr/share/nvim/archlinux.vim
012.797  000.157  000.105: sourcing /etc/xdg/nvim/sysinit.vim
026.088  013.177  013.177: sourcing /home/user/.config/nvim/init.lua
026.125  003.244: sourcing vimrc file(s)
026.926  000.044  000.044: sourcing /home/user/.local/share/nvim/site/pack/packer/start/LuaSnip/ftdetect/snippets.vim
027.294  000.039  000.039: sourcing /usr/share/vim/vimfiles/ftdetect/PKGBUILD.vim
027.397  000.059  000.059: sourcing /usr/share/vim/vimfiles/ftdetect/meson.vim
027.489  000.048  000.048: sourcing /usr/share/vim/vimfiles/ftdetect/vagrantfile.vim
027.854  001.363  001.173: sourcing /usr/share/nvim/runtime/filetype.lua
027.968  000.048  000.048: sourcing /usr/share/nvim/runtime/filetype.vim
028.539  000.220  000.220: sourcing /usr/share/nvim/runtime/syntax/synload.vim
028.826  000.756  000.537: sourcing /usr/share/nvim/runtime/syntax/syntax.vim
030.913  000.047  000.047: sourcing /usr/share/nvim/runtime/plugin/gzip.vim
030.999  000.037  000.037: sourcing /usr/share/nvim/runtime/plugin/health.vim
031.138  000.091  000.091: sourcing /usr/share/nvim/runtime/plugin/man.vim
031.229  000.039  000.039: sourcing /usr/share/nvim/runtime/plugin/matchit.vim
031.600  000.325  000.325: sourcing /usr/share/nvim/runtime/plugin/matchparen.vim
031.701  000.049  000.049: sourcing /usr/share/nvim/runtime/plugin/netrwPlugin.vim
032.075  000.037  000.037: sourcing /home/user/.local/share/nvim/rplugin.vim
032.094  000.349  000.312: sourcing /usr/share/nvim/runtime/plugin/rplugin.vim
032.293  000.152  000.152: sourcing /usr/share/nvim/runtime/plugin/shada.vim
032.387  000.037  000.037: sourcing /usr/share/nvim/runtime/plugin/spellfile.vim
032.483  000.047  000.047: sourcing /usr/share/nvim/runtime/plugin/tarPlugin.vim
032.568  000.037  000.037: sourcing /usr/share/nvim/runtime/plugin/tohtml.vim
032.667  000.051  000.051: sourcing /usr/share/nvim/runtime/plugin/tutor.vim
032.766  000.048  000.048: sourcing /usr/share/nvim/runtime/plugin/zipPlugin.vim
033.070  000.050  000.050: sourcing /usr/share/vim/vimfiles/plugin/fzf.vim
033.270  000.147  000.147: sourcing /usr/share/vim/vimfiles/plugin/redact_pass.vim
063.140  010.130  010.130: sourcing /home/user/.local/share/nvim/site/pack/packer/start/onedark.nvim/colors/onedark.lua
108.306  074.803  064.673: sourcing /home/user/.config/nvim/plugin/packer_compiled.lua
109.021  004.420: loading rtp plugins
109.678  000.245  000.245: sourcing /home/user/.local/share/nvim/site/pack/packer/start/LuaSnip/plugin/luasnip.vim
110.296  000.385  000.385: sourcing /home/user/.local/share/nvim/site/pack/packer/start/indent-blankline.nvim/plugin/indent_blankline.vim
111.682  001.048  001.048: sourcing /home/user/.local/share/nvim/site/pack/packer/start/nvim-treesitter/plugin/nvim-treesitter.lua
112.090  000.168  000.168: sourcing /home/user/.local/share/nvim/site/pack/packer/start/vim-cool/plugin/cool.vim
112.269  001.401: loading packages
112.641  000.231  000.231: sourcing /home/user/.local/share/nvim/site/pack/packer/start/Comment.nvim/after/plugin/Comment.lua
112.650  000.151: loading after plugins
112.663  000.012: inits 3
117.340  004.678: reading ShaDa
124.668  000.452  000.452: sourcing /usr/share/nvim/runtime/autoload/htmlcomplete.vim
124.819  000.754  000.302: sourcing /usr/share/nvim/runtime/ftplugin/html.vim
125.160  001.425  000.671: sourcing /usr/share/nvim/runtime/ftplugin/markdown.vim
127.838  000.333  000.333: sourcing /usr/share/nvim/runtime/syntax/javascript.vim
130.206  002.177  002.177: sourcing /usr/share/nvim/runtime/syntax/vb.vim
136.619  006.283  006.283: sourcing /usr/share/nvim/runtime/syntax/css.vim
137.933  011.319  002.527: sourcing /usr/share/nvim/runtime/syntax/html.vim
138.344  011.844  000.524: sourcing /usr/share/nvim/runtime/syntax/markdown.vim
204.324  073.714: opening buffers
205.968  001.644: BufEnter autocommands
205.977  000.009: editing files in windows
206.806  000.829: VimEnter autocommands
206.814  000.008: UIEnter autocommands
207.221  000.287  000.287: sourcing /usr/share/nvim/runtime/autoload/provider/clipboard.vim
207.231  000.131: before starting main loop
271.764  064.532: first screen update
271.772  000.008: --- NVIM STARTED ---

Whenever I edit a markdown file with more than 300 or 500 lines with some code blocks, the input latency increases dramatically. When it's more than 1000 lines, I have to wait for almost a second for a keypress to show up on my screen. If I delete characters, the cursor disappears and text is deleted with a delay of almost a second.

I'm not sure how to disable syntax highlighting for fenced code blocks when using the markdown treesitter parser or if it'll help. If I disable the markdown treesitter parser, there's a significant improvement in input latency.

I've noticed from the startuptime logs that vimscript runtime syntax files are sourced for code blocks, including markdown.vim, even though I've installed treesitter parsers for all the languages mentioned in the log and I've also disabled vim regex syntax highlighting in my neovim config.

clason commented 2 years ago

I'm not sure how to disable syntax highlighting for fenced code blocks when using the markdown treesitter parser or if it'll help. If I disable the markdown treesitter parser, there's a significant improvement in input latency.

Remove the injections.scm from your runtime path.

I've noticed from the startuptime logs that vimscript runtime syntax files are sourced for code blocks, including markdown.vim, even though I've installed treesitter parsers for all the languages mentioned in the log and I've also disabled vim regex syntax highlighting in my neovim config.

Are you sure they're actually executed? They will show up even if they're skipped by finishing early (which is the usual mechanism for Vim to "skip" files).

ayushnix commented 2 years ago

Remove the injections.scm from your runtime path.

I moved the injections.scm file out of my runtime path and markdown files still highlight the fenced code blocks and have the same input latency as mentioned before.

I confirmed that the injections.scm file was not in my runtime path using

:lua print(vim.inspect(vim.api.nvim_get_runtime_file('queries/lua/*', true)))

Are you sure they're actually executed?

Sorry, I'm not. I assumed they were since they were adding non-negligible time in the startuptime log.

clason commented 2 years ago

:lua print(vim.inspect(vim.api.nvim_get_runtime_file('queries/lua/*', true)))

that's the wrong one, though -- you want the queries/markdown/injections.scm.

ayushnix commented 2 years ago

that's the wrong one, though -- you want the queries/markdown/injections.scm.

Ah, that helps, thanks!

The input latency is almost back to normal. If I delete characters using backspace, the cursor still disappears though. I've confirmed that this doesn't happen in other types of files.

https://user-images.githubusercontent.com/79408161/174270301-b75426b4-8ee6-49c6-8f44-b78a95adc738.mp4

The syntax highlighting for markdown also gets messed up in some regions but that's probably just markdown quirks though.

clason commented 2 years ago

Yeah, markdown is just hard to parse into a syntax tree. People are working on that, but it is highly non-trivial.

ayushnix commented 2 years ago

@clason that's okay, thanks for your help

This is an unrelated question but can you point me to a markup language for writing documents that is well supported in treesitter and doesn't have performance issues in neovim if the document is more than 1000 or 2000 lines long?

I'm considering writing my documents in another such markup language and then converting it back to markdown using pandoc before I push them to a git repo.

clason commented 2 years ago

Not to my knowledge; this is a fundamental limitation common to all "soft" markup languages (opposed to structured ones like HTML or LaTeX).

You could give RST a try, though.

ayushnix commented 2 years ago

That's disappointing. It reminds me of this post on undeadly.org about markdown.

I'm not sure if neorg and its treesitter parser can handle large documents without introducing input latency in the terminal. If not, I'll probably switch to writing articles in HTML.

Thanks!

theHamsta commented 2 years ago

@ayushnix I'm sure the problems with markdown input latency can be solved by a time out for tree-sitter parsing. It was working smoothly when I added the time out (except that highlighting was lost sometimes due to the fact that there is not code), possibly switching to background parsing or to reusing the previous parsing result. We're talking about max 42ms during incremental parsing which is slow enough to build up a latency lag when you type multiple letters at once, but still manageable as an editor to provide the highlighting. In other words: it's to slow for "on every keystroke", but fast enough to catch up once it moved to background parse once it reached the timeout. The problem we're experiencing here is that after a fast input of 5 letters, we experience 5 times the parsing latency while with a timeout it would be possible to cancel the first 4 letters and finish at the last letter with a background thread. Usually, the 5 times incremental parsing should go really fast as the parser state should have changed much. But even when that does not work the editor should harness itself against excessive parsing times.

There is not fundamental limitation why Markdown parser should be slow. It's just that nested pairs of delimiters are difficult to express with tree-sitter and almost always require an external parser that can count the nesting state. You can test whether https://github.com/ikatyang/tree-sitter-markdown has the same limitation. It's also possible that the parser of @MDeiml has some properties that make the incremental parsing logic fail to build efficiently on the previous result.

theHamsta commented 2 years ago

@ayushnix can you provide some evidence that the injections have any effect at all? With https://github.com/neovim/neovim/pull/18761 you can visualize what fraction of the latency is cause by markdown parsing and what by the injections. In the document I tested I was experiencing latency purely by the markdown parser with injection causing only a negligible fraction of the whole incremental parsing

MDeiml commented 2 years ago

I'm actually experimenting at the moment on if I can get this faster. This would include optionally only parsing inline that are visible (parsing inlines only depends on all the inlines in the same block and not other blocks) and a few changes around paragraphs, which are kinda important and really slow at the moment. But if I can get something faster to work it's gonna take a while since it probably needs some features in upstream neovim.

MDeiml commented 2 years ago

But I'm quite confident that I should be able to get this at least somewhat fast since parsing the block structure could probably be done pretty fast since it's well definer, it's mainly inlines like links and emphasis that make the parser slow. I should be able to split the two .

MDeiml commented 2 years ago

I think I found the cause of this issue. I think tree-sitter has problems with reusing trees that are very "flat", i.e. trees where most nodes have a lot of siblings. This is not a problem with usual programming languages, since they're often structured hierarchically, but with markdown (as it is now) most nodes are children of the root node.

When parsing a file after introducing some edits, all siblings of nodes that changed are also parsed again. I'm not sure why, maybe @maxbrunsfeld could give some insights?

I was able to solve this by just introducing more hierarchy artificially. More concretely I added a section node, which starts with a heading and stretches until the next heading. With this I can get syntax highlighting in a ~3000 line file without any noticeable delay.

I might try later to get a quick fix in this way for the current version, but as I said I'm currently working on rewriting the parser so I'd rather work on that.

clason commented 2 years ago

A quick fix would be very much appreciated, since the rewrite sounds like something we can't just drop in in place of the current one (needing substantial infrastructure work to support such "split parsers").

Of course, I understand that this is much less interesting work ;)

theHamsta commented 2 years ago

A quickfix would probably to let nvim timeout long parsings. We will always face the situation that parsing is when the file is too big (at least for initial parsing). Although a change in Neovim might not be not that quick.

clason commented 2 years ago

We'll never know until someone puts a PR for it up for discussion...

MDeiml commented 2 years ago

I tried to implement the fix on the main branch, but I didn't get the same speedup. Not sure why.

theHamsta commented 2 years ago

We'll never know until someone puts a PR for it up for discussion...

well, I guess the how in the implementation is the thing that's taking some time... Maybe I'll find some time tomorrow for it. There are quite many possibilities to deal with this and neither me knows what is the best one until I tried them out.

MDeiml commented 2 years ago

~I noticed something else while writing rust bindings for my parser. If I use a single tree-sitter parser object and ts_parser_set_language then parsing again after edits seems to happen almost instantly. If I use one parser for each language then parsing after edits takes equally as long as the first parse.~

~I don't know if this is specific to my use case, but maybe it would make sense to investigate something similar for neovim, as it seems it also uses on parser per language.~

Nvm there was a hidden error and I was getting garbage data.

MDeiml commented 2 years ago

https://github.com/tree-sitter/tree-sitter-haskell/issues/41#issuecomment-1004310271

It seems that my previous comment about hierarchical structure was the right hunch. Reducing conflicts should be the main priority for slow parsers, but "sectioning off" the conflicts seems to work as well. Unfortunately neither is possible for inline markdown elements like emphasis.

maxbrunsfeld commented 2 years ago

I think I found the cause of this issue. I think tree-sitter has problems with reusing trees that are very "flat", i.e. trees where most nodes have a lot of siblings. This is not a problem with usual programming languages, since they're often structured hierarchically, but with markdown (as it is now) most nodes are children of the root node.

I think it must be something more specific than that; otherwise it would reproduce in, for example, a large C file with hundreds of small functions, since those functions would all be sibling nodes.

I'm curious what's going on, and I'll try to reproduce the slowness with the tree-sitter CLI, using the parse --edit command.

maxbrunsfeld commented 2 years ago

Ok, I can reproduce the problem from the command line. I believe the problem is a certain conflict in your grammar, between _soft_line_break and _paragraph_end_newline. It causes every paragraph to be considered "fragile", and not re-usable.

I determined this by creating a small markdown file, test.md with five two-word paragraphs:

a b

c d

e f

g h

i j

I then parsed this file from the command line with debug graphs enabled:

tree-sitter parse test.md -D

(document [0, 0] - [10, 0]
  (paragraph [0, 0] - [1, 0])
  (paragraph [2, 0] - [3, 0])
  (paragraph [4, 0] - [5, 0])
  (paragraph [6, 0] - [7, 0])
  (paragraph [8, 0] - [9, 0]))

This creates a long sequence of SVG graphs. In this graph, you can zoom in on a particular point, when the parser reaches the end of a paragraph, and see that the parse stack splits into two branches:

graph

![Screen Shot 2022-06-21 at 1 13 26 PM](https://user-images.githubusercontent.com/326587/174889189-bf161dfd-0aea-4b36-b9ad-39ce8a422740.png)

Any node that is created in an ambiguous state like this is considered fragile - it cannot be reused during incremental parsing if any of its contents have changed. In this case, the ambiguity is still in effect while the paragraph and block nodes are created.

To observe the performance impact of this ☝️ more directly, you can perform an edit and an incremental re-parse at the command line, inserting a character on line 4 (the third paragraph).

tree-sitter parse test.md --edit '4,1 0 1'

It re-parses correctly, but if you run with -d (for terminal logging) or -D (to generate another SVG log), you can see that the parser decides not to reuse any block/paragraph nodes.

...
cant_reuse_node_is_fragile tree:_block
cant_reuse_node_is_fragile tree:paragraph
...

@MDeiml Can you think of a way to not have this conflict with _paragraph_end_newline? Can you tell the difference between a paragraph ending and a "soft" line break by the number of newlines?

MDeiml commented 2 years ago

Thank you! I have a fix for this conflict in paragraphs where I parse ahead quite a bit to determine if a newline is a soft line break. This means that a lot paragraphs can now be reused.

But a similar problem now appears with emphasis, which appear in a lot of paragraphs as top level inline nodes. I'm not sure it's possible to parse those without conflicts as that would require potentially infinite lookahead. But maybe it's possible to create a "fast path" for the most common use case of no nested inlines.

maxbrunsfeld commented 2 years ago

I think it's probably ok for emphasis to have that conflict, since most (all?) top-level nodes in the document are not emphasis nodes.

MDeiml commented 2 years ago

That's true, but pretty much every top level node has children that cannot be reused, which means that parsing is still slow in very very large documents, though I can get it to very acceptable speeds for e.g. the README for this repo.

I have a question though, shouldn't it be possible to reuse fragile trees (whole trees not nodes) if all edits were outside their range set with ts_parser_set_included_range? I have to admit I don't really understand this concept of fragility so I might be wrong, but even with conflicts parsing should be deterministic.

I am currently working on a version of this parser where inline elements (emphasis) and block elements (paragraphs) are split into two grammars. This means that every inline range is parsed separately. I noticed that almost all of the inline ranges are not reused, which makes sense as most contain emphasis and are thus fragile. But all that needs to be done is to shift the node positions, so I'd be keen to just not reparse the unaffected trees.

nvim-treesitter / nvim-treesitter