sharkdp / bat

A cat(1) clone with wings.
Apache License 2.0
49.67k stars 1.25k forks source link

Improve bat startup speed #951

Closed sharkdp closed 2 years ago

sharkdp commented 4 years ago

The startup speed of bat is currently (v0.15) around 50 ms, which is on (or even past) the edge of being noticeable by humans.

It would be great if this situation could be improved.

The startup speed of bat can be measured with hyperfine:

hyperfine \
  --warmup 10 \
  --export-markdown bat_startup.md \
  'bat --no-config --color=always'

On my laptop, this results in:

Command Mean [ms] Min [ms] Max [ms] Relative
bat --no-config --color=always 45.3 ± 0.7 44.0 47.4 1.00

If we use perf to get a profile

perf record --call-graph dwarf bat --no-config --color=always < /dev/null

image

we can see that most of the time is spent deserializing the stored syntaxes and themes via bat::assets::assets_from_cache_or_binary.

There are several ideas that come to mind which could improve the situation:

We should also validate that we are indeed CPU bound here. The bat binary is quite large, due to the fact that all syntaxes and themes are included within the binary. It might take some time to simply load that from disk (or cache).

eth-p commented 4 years ago

Adding some profiling data from MacOS Mojave (with a SSD):

Note: The numbers are going to be off (i.e. about 2x worse) due to the added overhead from running a profiler.

Build: RUSTFLAGS=-g cargo build --release Command: target/release/bat --no-config --paging=never empty.cs

Total: image

Syntax Loading: image

Disk Latency: image

Disk Usage: image

Allocations: image


It looks like around 31% of the time spent starting up was loading the executable image and initializing it (on MacOS), and around 63% of the total time (95% of main()) was parsing the syntax set, with 42% (60% of main()) being spent deserializing data using serde. I don't quite know where the extra 26% went though, unfortunately.

Meanwhile (based on the disk usage graph), only 2 milliseconds were spent paging in the executable.

I'm also inclined to say that the size of the executable/time spent paging isn't too much of an issue for us currently. 2 milliseconds loading the serialized asset data is significantly less than the 108 milliseconds spent deserializing it.

Additional things to note: Although I passed --no-config as a command line flag, bat still opened the config file. Was that intended?

Enselic commented 3 years ago

I have made some progress on improving bat startup time through some prototyping work, and I would like to share that prototype here. I would love to get some feedback on it. Especially critisism!

The code for the prototype is here: https://github.com/Enselic/bat/compare/6ef2bb3283e1ba5f41316...Enselic:startup-speed-prototype-v1?expand=1

First, allow me to present some performance numbers from my (low-end) machine:

File under ./tests/syntax-tests/source bat master my prototype
example.zig 108.5 ms 25.8 ms
example.xml 102.1 ms 45.1 ms
example.md 123.1 ms 57.5 ms

One major limitation of the prototype is that I only made it work for --language, and not for file extension or first-line. So for example, I use this command to benchmark .xml:

hyperfine 'bat --no-config --pager=never --color=always ./tests/syntax-tests/source/XML/example.xml --language xml'

It should be pretty straightforward to make it work for e.g. file extensions too, but I don't want to spend time on it until I have gotten an external sanity check on my overall approach. For now, I simply fallback to loading the full SyntaxSet in these cases.

We can see from the benchmark that small syntaxes such sa Zig, which only contains the Zig Syntax Definition, is pretty fast. Medium sized SyntaxSet such as XML (contains ["xml", "xsd", "xslt", "tld", "dtml", "rng", "rss", "opml", "svg"]) is slower, and even larger ones such as Markdown is even slower. But still a nice improvement I would say!

Some notable positive properties of the prototype:

Some notable negative properties:

So, how does the prototype work? Roughly like this:

(I realize I should write a lot more details of how it works, but I unfortunately don't have the time for that right now. The code is there for anyone to poke around with though :))

My current plan forward is to take the code from the prototype and turn it into several small and independent PRs that are easy to review and understand one by one, to the extent that is possible and makes sense.

Enselic commented 3 years ago

Just wanted to elaborate a bit on the next steps I intend to take on the prototype. I will keep updating this comment.

I plan on doing the following sequential steps: