Closed sharkdp closed 2 years ago
Adding some profiling data from MacOS Mojave (with a SSD):
Note: The numbers are going to be off (i.e. about 2x worse) due to the added overhead from running a profiler.
Build: RUSTFLAGS=-g cargo build --release
Command: target/release/bat --no-config --paging=never empty.cs
Total:
Syntax Loading:
Disk Latency:
Disk Usage:
Allocations:
It looks like around 31% of the time spent starting up was loading the executable image and initializing it (on MacOS), and around 63% of the total time (95% of main()
) was parsing the syntax set, with 42% (60% of main()
) being spent deserializing data using serde
. I don't quite know where the extra 26% went though, unfortunately.
Meanwhile (based on the disk usage graph), only 2 milliseconds were spent paging in the executable.
I'm also inclined to say that the size of the executable/time spent paging isn't too much of an issue for us currently. 2 milliseconds loading the serialized asset data is significantly less than the 108 milliseconds spent deserializing it.
Additional things to note:
Although I passed --no-config
as a command line flag, bat
still opened the config file. Was that intended?
I have made some progress on improving bat startup time through some prototyping work, and I would like to share that prototype here. I would love to get some feedback on it. Especially critisism!
The code for the prototype is here: https://github.com/Enselic/bat/compare/6ef2bb3283e1ba5f41316...Enselic:startup-speed-prototype-v1?expand=1
First, allow me to present some performance numbers from my (low-end) machine:
File under ./tests/syntax-tests/source | bat master | my prototype |
---|---|---|
example.zig | 108.5 ms | 25.8 ms |
example.xml | 102.1 ms | 45.1 ms |
example.md | 123.1 ms | 57.5 ms |
One major limitation of the prototype is that I only made it work for
--language
, and not for file extension or first-line. So for example, I use this command to benchmark .xml
:
hyperfine 'bat --no-config --pager=never --color=always ./tests/syntax-tests/source/XML/example.xml --language xml'
It should be pretty straightforward to make it work for e.g. file extensions too, but I don't want to spend time on it until I have gotten an external sanity check on my overall approach. For now, I simply fallback to loading the full SyntaxSet in these cases.
We can see from the benchmark that small syntaxes such sa Zig, which only contains the Zig Syntax Definition, is pretty fast. Medium sized SyntaxSet such as XML (contains ["xml", "xsd", "xslt", "tld", "dtml", "rng", "rss", "opml", "svg"]
) is slower, and even larger ones such as Markdown is even slower. But still a nice improvement I would say!
Some notable positive properties of the prototype:
cargo test
tests passes.Some notable negative properties:
So, how does the prototype work? Roughly like this:
independent_syntax_sets.bin
contains concatenated binary representations of independent syntax sets, and independent_syntax_sets_map.bin
contains a small lookup datastructure so they can be found again.(I realize I should write a lot more details of how it works, but I unfortunately don't have the time for that right now. The code is there for anyone to poke around with though :))
My current plan forward is to take the code from the prototype and turn it into several small and independent PRs that are easy to review and understand one by one, to the extent that is possible and makes sense.
Just wanted to elaborate a bit on the next steps I intend to take on the prototype. I will keep updating this comment.
I plan on doing the following sequential steps:
--language
SyntaxReference
. We probably need to find a way to reduce the growth of the binary size first, though.include_integrated_assets
syntaxes.bin
when minimal_syntaxes.bin
reliably works for all syntaxes.
The startup speed of
bat
is currently (v0.15) around 50 ms, which is on (or even past) the edge of being noticeable by humans.It would be great if this situation could be improved.
The startup speed of
bat
can be measured withhyperfine
:On my laptop, this results in:
bat --no-config --color=always
If we use
perf
to get a profilewe can see that most of the time is spent deserializing the stored syntaxes and themes via
bat::assets::assets_from_cache_or_binary
.There are several ideas that come to mind which could improve the situation:
We should also validate that we are indeed CPU bound here. The
bat
binary is quite large, due to the fact that all syntaxes and themes are included within the binary. It might take some time to simply load that from disk (or cache).