Open kchibisov opened 2 years ago
Author of the markdown grammar here. It seems that at the moment language injection cannot be parsed incrementally. The markdown grammar uses injection a lot so it becomes quite apparent for it. This is what I get from line 846 here:
https://github.com/helix-editor/helix/blob/8681fb6d9ee06610fac2baafb4ca34f93f9c7f4d/helix-core/src/syntax.rs#L841-L848
Notice how a layer can only be reused if ranges
did not change. But this is often the case when editing a file. The test here could probably be laxer.
I didn't look to deep, but (from my knowledge of treesitter) any layer should be fine here to "reuse" as long as none is reused twice. So something akin to
should work.
We actually follow tree-sitter-cli/atom logic to update all the markers according to user edits beforehand: https://github.com/helix-editor/helix/blob/8681fb6d9ee06610fac2baafb4ca34f93f9c7f4d/helix-core/src/syntax.rs#L644-L726=
So the injection layers are tracked reliably.
We also seem to only use HTML injections & code blocks: https://github.com/helix-editor/helix/blob/master/runtime/queries/markdown/injections.scm
Ah ok, I guess you are still using the old version of my parser then (the main
branch). You could try updating to the split_parser
branch. That one has some major performance improvements, but uses injection very extensively.
Ah, good point! I'll see if I can update us to the new branch 👍
On an 8.4MiB markdown file (I keep daily notes in a single big document), helix 22.05 takes > 10s to open it and is used 1GiB of memory after doing so. I'm assuming this is tree-sitter related and the same issue described here, but let me know if you think not.
The following script generates a ~5MiB file that takes >5 seconds to open with Helix. After opening, memory usage (reported via ps_mem) is ~660MiB.
#!/bin/sh
for i in $(seq 10000); do
cat <<EOF >> big_markdown.md
# Title
## Subtitle
Some notes about things.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt
in culpa qui officia deserunt mollit anim id est laborum
* A list
* With a sub-item
* Item 2
* Item 3
* Item 4
* Item 5
* Item 5.1
EOF
done
@asb if you disable grammar for markdown does that happen?
@kchibisov I'm not sure what option to change to disable the grammar (I was trying out helix for the first time). Certainly, if mv big_markdown.md big markdown.txt
and open the version of the file with a .txt extension, it opens instantly and helix consumes a more reasonable 18MiB RAM.
I think the simplest way to fix this is to only run the grammar for injected languages if they are visible (i.e. run it lazily).
Markdown is split into two grammars: block and inline. The block-level grammar is quite fast and produces the ranges on which the inline-level grammar is injected. So the computational cost mostly grows with the amount of inline content parsed.
I'd guess that the memory cost is also from the parsed inline-level trees, so that problem would also be solved in the same way, but I'd have to check that.
I'm not sure how to best implements something like this tough. Currently the parsing is handled by the Document
impl, but for lazy parsing we need information from the View
s that point to that document.
Having trouble when open rust.org.gz 3000 line .org
file and never finish. Renaming it to abc.xyz to avoid syntax highlight then the process is instant.
I think the simplest way to fix this is to only run the grammar for injected languages if they are visible (i.e. run it lazily).
Markdown is split into two grammars: block and inline. The block-level grammar is quite fast and produces the ranges on which the inline-level grammar is injected. So the computational cost mostly grows with the amount of inline content parsed.
I'd guess that the memory cost is also from the parsed inline-level trees, so that problem would also be solved in the same way, but I'd have to check that.
I think that this could build upon the work in https://github.com/helix-editor/helix/pull/2857 to keep track of the nodes and lazily highlight injections.
Just wanted to mention #4146 here. Maybe that fixed some problems, but probably not all of them.
That indeed improved the situation and the editing of markdown is at least possibly with tree-sitter.
Large files still blow up helix leading to seconds in input lag. Not sure where it should be tracked.
With helix 22.12 (34be71fb)
I am seeing about 1 second delay per keystroke when doing either insert or search. The file I am editing is a 129MB mysql dump file with 150 very long lines.
The machine I am working on is a MacBook Pro M1 with 16GB of RAM.
Wondering if #6218 fixed this
It probably fixed or improved viewing the file, but insert mode will still be just as slow on large files.
The file I am editing is a 129MB mysql dump file with 150 very long lines.
Yeah both of these are an issue. You're parsing a very large file but performance is also not optimized on long lines. Or at least it wasn't before but maybe @pascalkuthe's rewrite improved things slightly.
The file I am editing is a 129MB mysql dump file with 150 very long lines.
Yeah both of these are an issue. You're parsing a very large file but performance is also not optimized on long lines. Or at least it wasn't before but maybe @pascalkuthe's rewrite improved things slightly.
Right now it shouldnt make much if a difference. Howver performance for huge lines will be improved in a future followup PR to #5420 where I will Implement line chunking. This was omitted from the MVP but the architecture lends itself well to this optimization (and was designed with it in mind, I even had a mostly finished implementation but the changes were accidently deleted by an overwrite from a second helix instance)
While my original issue is resolved with the markdown.
I'd think I'll leave a real world example of a big file that is well formatted and not some dump of the logging utility, since I've mentioned in original report a C++ file, but I haven't showed it.
So that's the file I've referred to originally wrt my C++ quote.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86ISelLowering.cpp
The :set-language text
makes a normal editing experience(same perf as file of any other size).
I use git commit --verbose
, which causes the COMMIT_EDITMSG
files to contain the full diff of what's being committed. At work, I just happened to commit changes weighing about ~400k lines of diff, all generated, and I experienced this slowness - each insert keystroke lagged by about a second. @kchibisov's suggestion of :set-language text
works like a charm as a workaround, for now.
Still happening with 23.03, for what it's worth.
Summary
Certain markdown files are very slow to edit. In particular when they have a lot of lists. For example a
CHANGELOG.md
file from the helix repo is a enough to trigger it.The problem is likely in grammar here, since neovim with tree-sitter works fine on it.
While this particular problem could be solved with slightly optimized, but less accurate grammar. However the issue manifests itself on any large file of any other type, just requires much larger file than ~600LoC(I've used 50K LoC C++ that I have around).
To avoid delay and input lag in those particular scenarios it would make sense to either run tree-sitter only on a part of the buffer (I'm not familiar how it works) or update it after the typing ended. Since right now you have a latency of like 2 seconds on 50K lines file of C++, which is impossible to work with.
Reproduction Steps
Try to edit CHANGELOG.md from the helix repo or any large file.
Helix log
No response
Platform
Linux
Terminal Emulator
alacritty
Helix Version
helix 22.05 (3cced1e3)