Closed Harmos274 closed 2 years ago
I'm aware, haven't gotten around yet to debugging it!
Nice, thank you ! Can't wait 😄
I was trying to read up on some potential performance bottlenecks. The docs mention the tree_sitter_my_language_external_scanner_serialize function, and particularly the state it needs to serialize.
This function https://github.com/tree-sitter/tree-sitter-haskell/blob/master/src/scanner.cc#L1645-L1654
I was also trying to look into ways to benchmark tree-sitter, but couldn't find anything. If you have any tips on how to do it (or resources), I would love to aggregate some data to try and identify cost centers.
yeah I think the state couldn't really be any simpler :slightly_frowning_face: my suspicion is that the external scanner shouldn't be invoked on every token, but changing that might not be possible.
I also haven't managed to find any useful debugging tools.
Hmmmm, damn. Yeah, I assumed the state would be simple. Its hard for me to imagine that something like typescript (which has good performance) would have simpler state than haskell.
Also I found these:
Perhaps the haskell grammar is having a harder time taking advantage of caching? Although that seems really unlikely to me.
Also, if its helpful, the performance degradation only happens once I enter insert mode.
Also, if its helpful, the performance degradation only happens once I enter insert mode.
I'm painfully aware :sweat_smile:
FWIW a simple reproducer is just a few hundred lines of a = a
. At about 400 lines it stars to become noticeable for me.
I generated the s-expression for a ~300 line haskell file, it seems large, but I don't have a frame of reference. You can preview the file here. Also if you generate the graph locally (you need graphviz) the tree looks unbalanced and very deep, I couldn't upload the file because it was 200mb and that's above githubs limit. The command to generate the debug graph is this tree-sitter parse --debug-graph profile.hs
thanks for looking into this!
FWIW a simple reproducer is just a few hundred lines of
a = a
. At about 400 lines it stars to become noticeable for me.
Interestingly a program like this is linear in the size of the state it creates relative to the number of expressions; an a = a
expression generates a 4 line s-expression, so 400 of these create a 1600 line graph, which is about as performant as you can expect it to be without doing something very clever. See the full results here. Perhaps the nvim-treesitter client (the client I am using) needs to query / update the state less frequently? I imagine you could delay updating / reading from state until the user has exited insert mode? This is just a hypothesis.
It must be something other than that, there are other grammars which produce 1600 line expressions with no impact on editor performance.
I wish I was more familiar with how tree-sitter parsers work (maybe I will investigate that next), but perhaps there is something specific to haskell that triggers re-computations of massive parts of the state space? A simple hypothesis would be that, based on the current grammar, tree sitter can't localize the changes to the expression where they are occurring and therefore has to recompute the whole graph? Or maybe the grammar is bigger than other languages, so the parser has to do more conditional checks?
the hypothesis with the recomputation on change sounds very plausible
My current working hypothesis leads me to believe that the issue is with nvim-treesitter
then. I think the changes that are required to make this plugin usable are:
I'm surprised that the third point isn't mentioned – isn't that the biggest selling point of TS?
Yeah, I thought so!
Also seeing lots of lag in the helix
editor.
The helix editor is written in Rust, and has tree-sitter support built in, so there's very little overhead.
It also seems to be running tree-sitter
synchronously with every inserted character, the result of which is that I can type faster than the editor updates.
I've uploaded a flamegraph of running helix
and typing into a haskell buffer for a few seconds.
Maybe that'll provide some insight? It seems to be spending rather a lot of time in tree_sitter_haskell_external_scanner_scan
.
that is helpful, thanks!
since the graph showed that a lot of time is spent in logic::symop
I tried replacing the parser combinators in there with a switch
, since it was doing 10 checks on a single character in a row. alas, didn't change much.
one thing I noticed in nvim is that when holding a key in insert mode, the lag increases with the number of characters inserted. wonder if it would be possible to abort the current tree edit when it doesn't complete when the next character has arrived, or at least batch characters for the next edit
edit: looks like that only happens when the popupmenu is visible
at some point I'll have to accept that writing an ad-hoc functional parser in c++ might not have been a suitable choice for the scanner
I added timing output to the example project parsing script and that change in symop
did in fact have an impact on performance, about 5-10% faster on semantic
(45 vs 41 seconds)
Is there a reason to have an external scanner? Are external scanners faster?
no, the grammar doesn't handle some things like indentation well
but feel free to try to move parts of the scanner to the grammar :smile:
lol, yeah, I am terrible at C++, but am trying to think of any small contributions I could make that might improve performance.
In symop
, if I'm interpreting the graph correctly, half the time is spent creating a closure for the scanner (operator+
), then half the time is spent running the scanner. The creation step involves a lot of calls to malloc
.
In the past, I've had excellent results using re2c
to generate really fast C
scanners, but creating a Haskell scanner sounds like a big project...
@414owen can you create a new graph with current HEAD?
and could you figure out a way to create that graph for a run of script/parse-example
and teach me? :smile:
In
symop
, if I'm interpreting the graph correctly, half the time is spent creating a closure for the scanner (operator+
), then half the time is spent running the scanner. The creation step involves a lot of calls tomalloc
.
so we'll have to figure out how to inline those functions properly
Sure, I'll create one now. Re: mallocs/closures. Is it possible to create these closures in advance, so they're not happening on every keypress?
Parser operator+(Parser fa, Parser fb) {
return [=](State & state) {
auto res = fa(state);
return res.finished ? res : fb(state);
};
}
if someone who's good at C++ can weigh in on the performance characteristics of this…
funny, using const Parser &
worsens the runtime by 25% :sweat_smile:
and using [&]
causes it to crash. there's got to be some combination of qualifiers that prevents allocation
~Okay, here's the flamegraph for parse-example
.~
These are the instructions I followed to create it: https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
For the previous one I just used cargo-flamegraph
.
On nixos everything you need is in the linuxPackages.perf
and flamegraph
packages.
edit flamegraph didn't come out right...
thanks!
Um I don't seem to have anything in my examples
folder, so parse-example
wasn't doing much. Is there an example usage?
I think it's best to flamegraph a binary, rather than a shell script, too.
yeah you'll first have to run script/parse-examples
, which clones those repos. I'm currently writing a separate script for creating a flamegraph that runs perf
on tree-sitter parse
directly
Exactly how pure are these parsers?
For, eg, Parser symop(Symbolic type);
could its Parser
result be memoized for each Symbolic type
?
Or even just created once for each Symbolic type
at the start of the program?
no idea whether that makes sense in c++!
Parser
is just an alias for function<Result(State&)>
interesting
when I run flamegraph I mostly get [unknown]
entries. any idea how to instruct tree-sitter to compile the parser with debug symbols?
my impression so far is that since std::function
is an object that stores all of its closure's captured variables, and most of those variables are again functions, and all of those functions are stack-allocated in other parser objects, there's just a lot of copying and allocations going on, especially when, as you noted, the parsers have value parameters like Symbolic::type
and the current indent. std::function
is probably not all that suited for functional programming
any idea how to instruct tree-sitter to compile the parser with debug symbols?
Doesn't tree-sitter just generate .c
files, which you can then compile with cc -g
?
This is how helix
is compiling tree-sitter grammars: https://github.com/helix-editor/helix/blob/a4641a8613bcbe4ad01d28d3d2a6f4509fef96a9/helix-syntax/build.rs#L91-L100
pretty similar to the command line that tree-sitter parse
uses :disappointed:
Here's a flamegraph for my memoized branch. It really didn't help that much.
we really need a C++ pro to assess what the right way to use std::function
is
on stackoverflow it's being said that those should be optimized away, but maybe not in the way I've used them
Yeah I stepped through a scan cycle in gdb, and pretty much everything seems to be in std::function
...
at some point I'll have to accept that writing an ad-hoc functional parser in c++ might not have been a suitable choice for the scanner
I think this might be right on the money, unfortunately.
I have asked my friend @avery-laird to take a look, he has lots of experience with C++. @tek What compiler are you using to compile the project? I imagine gcc
.
Good morning,
I like tree-sitter haskell very much but it seems it considerably slows when a file pass a certain number of characters. I don't actually know if the cause is the file's pattern complexity or anything but this is very penalizing...
Here's an example of slow file if you wan't to reproduce it :
Configuration:
Thank you for your help !