Closed maximbaz closed 5 years ago
kak -n
makes Kakoune skip loading all the included defaults, configuration and syntax highlighting, which is a lot of stuff. We were actually talking about this in #kakoune the other day; it's hard to pinpoint any specific thing that's slow, since Kakoune and its scripts do so many things at startup.
One practical thing you can do:
:echo %val{runtime}/autoload<ret>
to find out.~/.local/config/kak/autoload/
which will prevent Kakoune from automatically loading anything at startup.autoload
directory, so that the ones you don't use won't be read at startup and won't slow things down.You could try to measure startup time like this:
time -p kak -e 'quit'
real 0.48
user 0.21
sys 0.31
It takes about half a second to lanuch in my case, and I have shitload of plugins. I wonder how long it takes in your case.
edit: I did some testing how startup time could be optimized by merging autoloads into single file
# kakoune default config
time -p kak -e 'quit'
real 0.49
user 0.22
sys 0.32
# config merged: cat {base,core,extra}/**.kak > mega.kak
time -p kak -n -e 'source ./mega.kak;quit'
real 0.19
user 0.09
sys 0.11
It requires proper benchmark but at first glance it might be a good way to speed up launch time.
manually sourcing each file from kakrc is faster than loading same files from autoload.
manually loading all default kak
files takes ~115ms
while when putting them in autoload takes ~170ms
.
why is that?
I did some additional benchmarking:
nvim with no config: 25ms
❯ time (repeat 100 nvim -u NONE -c 'quit')
1.33s user 0.87s system 88% cpu 2.490 total
nvim with huge config and lots of plugins: 114ms
❯ time (repeat 100 nvim -c 'quit')
9.20s user 1.61s system 94% cpu 11.390 total
kak -n
: 7ms
(awesome result!)
❯ time (repeat 100 kak -n -e 'quit')
0.35s user 0.35s system 99% cpu 0.703 total
kak
with no custom config (except whatever ArchLinux package ships): 400ms
❯ time (repeat 100 kak -e 'quit')
31.48s user 12.14s system 108% cpu 40.057 total
So before I even start configuring kak
, it already takes almost 4 times longer to load than my fully customized nvim
— let's try to get this number down 😉
:echo %val{runtime}/autoload<ret>
outputs: /usr/share/kak /autoload
I don't have the folder /autoload
, but I do have the folder /usr/share/kak
present.
Creating a folder ~/.config/kak/autoload
does make kak
load much faster, as if I provided the -n
argument. However, I do want to have all those files loaded 🙂
Neovim is probably fast because it can load configs based on a file type, can we have the same in kakoune? For example, I probably don't really need the file rc/base/haskell.kak
loaded until I actually open a haskell file.
Merging the files together as @TeddyDD suggested does provide a huge benefit, I get the startup time decreased from 400ms
down to 176ms
(although nvim
is still faster even with all custom configs and plugins, again probably because it doesn't have to load everything on startup).
❯ time (repeat 100 kak -n -e 'source /usr/share/kak/rc/mega.kak;quit')
14.99s user 4.76s system 112% cpu 17.583 total
Well, Kakoune doesn't have to load everything as well. I can imagine we could put source commands into hooks so Kakoune would load language support only when file with specific extension is opened:
hook global BufCreate .*\.go$ %{
source /usr/share/kak/go.kak
source /usr/share/kak/go-tools.kak
}
Of course this hook won't work in Kakoune in current form. This solution would require quite a lot changes in scripts. First - we need a way to check if file was already sourced, second - language initialization should be handled after sourcing (usually language script does it in BufCreate hook, but in this case buffer already exists).
I don't think it's hard but it's a lot of work.
merging everything into one file and sourcing it on every start is faster than normal autoloading. this seems like some redundancy in the autoloading and sourcing mechanism.
I think the reason why merged config loads faster is just less file seeks (107 file reads vs 1) but I might be wrong.
im talking about performing merge on every startup
Oh, I see. You right, megre operation is blazing fast while autoloading is way slower. Interesting
Looks like autoload is just kak/sh script here: https://github.com/mawww/kakoune/blob/0d838f80a0cc920e64c6a5a969861f83d96967a6/share/kak/kakrc
source command: https://github.com/mawww/kakoune/blob/665d3fa196f4df905ebd682965adf78d80eaf8a8/src/commands.cc#L1234
Kakoune can help measuring what takes time by adding a set global debug profile
command on top of the share/kak/kakrc
file (not the user config one, the bundled one). This will output various timings in the debug buffer.
I agree this is a performance bug, Kakoune should be able to start faster than that, and I dont see a really good reason for a massive performance difference between loading lots of small files from loading a single big file, except if we are already IO bound, which I doubt.
You can use dash instead of bash as /bin/sh, https://wiki.archlinux.org/index.php/Dash.
Also, I think kakoune need a lazy loading mechanism. There is no point to load go.kak while I'm not working in go at all. Maybe something as simple as fish's functions/ folder, where a function named a reside in functions/a.fish, and get loaded on the first time.
With latest code, you should be able to profile startup by running
kak -debug 'profile|shell' -e 'b *debug*; w debug; q'
This will write the content of the debug buffer post startup filled with timing information and what shell scripts are executed.
I would add the
-n
flag to the above command, as some of us (erm) have a rather substantial amount of scripts that are loaded at startup, which could falsify results reported here.
Nevermind, we need to load scripts in any case… except not two people have the same configuration.
The whole point of this is to profile script loading, starting with -n needs to be profiled differently (I use perf here)
We dont have the same performance either, on my laptop Kakoune loads in less than 200ms almost always, on a more beefy desktop it was taking up to 400ms (probably due to bash), hopefully we can pinpoint potential improvements to the startup time. I suspect setting /bin/sh to symlink dash is a pretty good start.
It did get better after #2196 and 7ed5d53, right now I'm seeing something like 230ms on average. However, neovim can still do it twice as fast 😛
To get comparable results we should just skip the user config, i.e. mv ~/.config/kak ~/.config/kak-tmp
and then only the default configs will be taken.
My top 5 results for kak -debug 'profile|shell' -e 'b *debug*; w debug; q'
:
c-family.kak
)ocaml.kak
, never used it in my life)To summarize, setting aside the kakrc
where the find
seems to already be pretty optimized, if we can postpone language-specific things to be loaded until after such file type is opened, this would greatly help the loading time.
I imagine the sourcing of c-family.kak takes so long because there are evaluate-commands %sh{
that are executed during sourcing, but can't we just wrap those in some kind of hook global WinSetOption filetype=(?!c)(?!cpp)(?!objc).* %[
?
sourcing '/usr/share/kak/kakrc' took 171574 us (this alone is already slower than entire nvim loading)
This is the whole loading, all the other sourcing of scripts is nested inside this one.
What shell is at /bin/sh on your system ? What difference do you get if you make it point to a lightweight shell (dash) ?
This is the whole loading, all the other sourcing of scripts is nested inside this one.
Ah, I see, so in reality c-lang.kak
is the slowest script.
What shell is at /bin/sh on your system ? What difference do you get if you make it point to a lightweight shell (dash) ?
It was bash, and dash is giving me a very good perf improvement, thanks for the idea @co-dh!
Time for some perf runs! The results are averaged for 100 executions of kak -e 'quit'
.
sh -> bash
: 242ms
sh -> bash
, removed c-family.kak
: 179ms
sh -> dash
: 98ms
sh -> dash
, removed c-family.kak
: 88ms
!!!Woohoo, we are finally beating neovim!
Can we still consider lazy-loading language-specific configs? Still not sure if this could be as simple as wrapping all top-level evaluate-command
calls into a WinSetOption filetype
hook, or there's more to it.
For a final boost, you can try to replace the sed
you added to autoload_directory with just xargs cat
. There might be some slight performance improvements in the latest commits (a5f53dccb7cab0bf4d5292dbad5e624690bb4a3b)
Just tried your idea, there's no visible improvement between the two. The commit a5f53dc has some tiny improvement, around 3ms, but at least it seems to be consistent and not a statistical fluke.
Having c-family.kak
lazy loaded would provide much better value 😛
Having dash is cool, but not everyone can afford changing their shell, and I'm dreading the time when dash 0.5.10 gets released for Arch Linux and stops being compatible with kakoune (https://github.com/mawww/kakoune/issues/2242)
i dont think most people need bash for their login shell. you can use dash for that and whatever shell for interactive use.
Also note that the login shell is independent from /bin/sh, you could set bash directly if you wanted to.
The biggest breakage you might experience with symlinking /bin/sh to dash is with scripts which use bashisms unknowingly, but even that's an easy fix.
In theory - yes, in practice there could be problems. Arch Linux for example comes with bash as /bin/sh, until recently it was not recommended to change /bin/sh as it was breaking a bunch of system scripts (or at least I was told this story).
Regardless, all I'm saying is that dash does provide perf improvement and it's great to have it, but kakoune shouldn't be enforcing people to change to dash. Even with bash it should be starting at least as fast as neovim. And right now it isn't the case yet, and the lack of lazy loading is the main causing factor in this issue.
I'm not sure why everyone is so carefully avoiding the topic of lazy-loading, it gives the great perf boost and is a very simple thing to do.
I made an example for c-family.kak
: https://github.com/mawww/kakoune/pull/2255/files?w=1
Do you see any issues with this approach?
In my view, lazy loading is the last resort, I'd much rather ensure that we are quick to load even without.
That said, I have a plan on how lazy loading would work, which is related to the "modules" system that would solve the dependency between script problem.
The idea for the "modules" system is to introduce 2 new commands:
provides
or module
or similar, that takes a single parameter which is an arbitrary string identifying a module, I expect most bundled .kak
files to start with that. That command will stop sourcing the current file if its parameter has already be given to a preceeding provides
command.
# File c-family.kak
provides c-family
# following commands will only be executed once, even if this file is sourced many times
requires
or similar, which takes a name as a parameter, and loads a file matching that name from the script path (script path being a list of directories in which we might have scripts, like /usr/share/kak/rc and ~/.config/kak/rc).
# File blah.kak
requires c-family.kak # sources /usr/share/kak/rc/c-family.kak, which will stop at the provides line if it was already loaded
# Now we know commands provided by c-family.kak are available
The link with lazy loading is that we can have
hook global BufSetOption filetype=(.*) %{ requires "%val{hook_param_capture_1}.kak" }
There are still problems to solve (for example, sourcing python.kak would add hooks that want to run on BufSetOption filetype=python, and those would not run currently on the file that triggers the loading of python.kak), but thats the general direction I had in mind.
Comments ?
I would like to have a dictionary of loaded modules saved some where, so requires c-family.kak will check if c-family.kak already loaded, and do not need to read c-family.kak. same as python's import. In this case, you actually don't need the provides.
I think you are over-complicating this @mawww, my tiny PR just adds 6 lines and it already works, all hooks are running, there no issues to solve as I can see, and I experience the full speed-up as expected.
Is there a real example when one module depends on another module? If yes, tell me the name and I'll try my proposed approach on that module. If no, let's not spend time solving a non-existing problem.
The benefit of modules is also to be able to share code between various scripts. Right now a lot of language scripts have copy-pasted logic to handle auto-indentation and auto-insertion of delimiters, it's rather fragile and I'm pretty sure some obscure language scripts are just outdated.
Gotcha, thanks for explaining this bit. But I don't want to mix refactoring scripts into modules with what I'm proposing here, modularization is a much bigger change with different goals.
Let me perhaps ask a different question: do you see any issues with this change? Would you merge it? It is 6 new lines that don't break anything and improve loading time by 60ms
on bash and 10ms
on dash.
I am not really keen on merging this change, It does improve startup time, which is good, but I dont see any reason to give c-family a special treatment, and I am not really convinced we want to have this pattern appear in all language support files.
Regarding the provides
command, the reason behind it (rather than just making require use a set of already required modules) was to play well with autoload/sourcing, so that a file would not get sourced multiple times (once because it was in autoload, and a second time when its "required" for the first time).
but I dont see any reason to give c-family a special treatment, and I am not really convinced we want to have this pattern appear in all language support files.
That would have been my second proposal, I can do this same change for all other files. Why do you not like this pattern, do you see some issue with it?
Not with the pattern itself, but with repeating the same thing in each of the 92 .kak files we have. This is why I suggested a "modules" solution, as it allows to move files out of autoload, and have a single hook that tries to autoload those files based on filetype.
Another (minor) concern is that by wrapping all that in a command, we keep the full text content of the command in memory, for something we ultimately want to run only once. It is solvable by adding a way to undefine a command, and do that as well after removing the hook, but its not exactly elegant.
I see, and just to be clear, I don't disagree with the modular approach, especially because it brings other benefits as @occivink has mentioned — I would love to see the "modules" solution! But it is a complicated change that you leave as the last resort and don't plan to do right now, while what I'm proposing is a simple thing for which I can make a PR today and close this ticket for good.
I think we can iteratively improve in this case, once we get to implement modular approach, deleting 6 lines per file is not the most difficult cleanup to do — and on a positive side, we will be able to measure the perf impact of "modules" solution, our goal would be not to decrease the performance comparing to "lazy loading" in its currently proposed form.
I'll close the "showcase" PR, if you agree that we should implement this pattern in all files (at least until you get time to work on modular approach), let me know and I will gladly prepare a PR for all files.
@mawww given that now we have hook -once
, how do you feel about having the following pattern in all rc files? I can still make a PR if you have no objections to merge this 🙂
diff --git a/rc/core/c-family.kak b/rc/core/c-family.kak
index def12daa..01f65a6b 100644
--- a/rc/core/c-family.kak
+++ b/rc/core/c-family.kak
@@ -126,6 +126,8 @@ define-command -hidden c-family-insert-on-newline %[ evaluate-commands -itersel
]
] ]
+hook -once global WinSetOption filetype=(c|cpp|objc) %[
+
# Regions definition are the same between c++ and objective-c
evaluate-commands %sh{
for ft in c cpp objc; do
@@ -271,6 +273,8 @@ evaluate-commands %sh{
"
}
+]
+
hook global WinSetOption filetype=(c|cpp|objc) %[
try %{ # we might be switching from one c-family language to another
remove-hooks window c-family-hooks
I am still unsure I would merge it, but if you want to give it a go, I'd love to get some timings and see how much startup time this would save.
I can convert a few heaviest files to get some estimates, I'm a bit hesitant to convert everything because it will take some time (e.g. I would want to manually test every syntax), and because of whitespace diffs it will quickly become outdated if other PRs will touch the same files.
Can we maybe discuss your concerns first?
Previously you mentioned two things: repeating the same thing in 92 files and creating many new commands that will remain in memory. Now commands are no longer being created, and all these 92 files already have WinSetOption filetype
hooks anyway, so I would argue this change will not add any new repetition.
Are you maybe concerned that this change will break a particular syntax? I can start by sending a PR just for that one (or for those few) syntaxes, and then you and I can do some more thorough testing.
I don't want to rush you, I want you too to be happy and confident with the change, but if we agree to do this, I would ask you to review & merge it quickly to avoid wasted effort and possible conflicts 😛
I'm seeing this on the native Android/AArch64 build in Termux, fairly extreme:
$ time kak -n -e 'quit'
real 0m0.041s
$ time kak -e 'quit'
real 0m1.312s
I'll see if I can track down where the biggest slowdown is.
Make sure to try setting /bin/sh
to point to dash
(instead of bash
), and enabling lazy-loading on at least some rc files like I showed above — doing that on c-family.kak
alone wins for me 60ms on bash
and 10ms on dash
.
Running the command @mawww gave above in Termux for Android/AArch64, I got these worst offenders:
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/base/x11.kak' took 15288 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/extra/iterm.kak' took 16960 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/extra/tmux-repl.kak' took 17861 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/extra/go-tools.kak' took 18453 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/extra/kitty.kak' took 19237 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/base/tmux.kak' took 19962 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/extra/git-tools.kak' took 20860 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/core/sh.kak' took 22383 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/base/screen.kak' took 23520 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/base/restructuredtext.kak' took 23562 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/core/kakrc.kak' took 24588 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/core/makefile.kak' took 27656 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/base/markdown.kak' took 28042 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/core/python.kak' took 30928 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/extra/nim.kak' took 32209 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/base/perl.kak' took 32756 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/extra/protobuf.kak' took 34743 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/base/d.kak' took 34916 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/extra/pony.kak' took 36093 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/extra/dart.kak' took 36550 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/base/go.kak' took 36580 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/extra/dockerfile.kak' took 37730 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/base/ruby.kak' took 42599 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/base/ocaml.kak' took 43719 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/base/sql.kak' took 46452 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/base/sql.kak' took 46452 us
hook 'BufCreate(*scratch*)' took 49988 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/base/clojure.kak' took 113046 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/core/c-family.kak' took 133646 us
sourcing '/data/data/com.termux/files/usr/share/kak/autoload/extra/scheme.kak' took 146976 us
sourcing '/data/data/com.termux/files/usr/share/kak/kakrc' took 1234156 us
Shell execution takes most of the time for all of these, those timings omitted from the log I pasted above. I took out 19 of those .kak
files for languages I don't use, and cut startup time in half to 0.63 s, still noticeable.
I thought I'd compare to kakoune running on other hardware. In a linux/x64 VPS with a fast SSD, I got a best startup time of 64 ms, 6ms with kakrc disabled. I then tried it in the Anbox Android/x64 container on the exact same hardware, and got a best startup time of 302 ms, 10 ms with kakrc disabled.
This implies there's something slow about Android or Termux itself, with the linux/x64 startup slowdown becoming much more noticeable in Termux. It would be nice if kakoune were optimized further for this issue, rather than just assuming everybody's running on a fast SSD and loading all the slower kak language files.
Maybe check the file extension and load the language accordingly?
@maximbaz, dash
is already the default in Termux, bash
slows kakoune startup down another 130 ms, after ditching those couple dozen kak files.
File type is the source of truth, not file extension, but in principle you are right and this is exactly what I'm proposing to implement as well.
A problem that I encountered when trying to implement my idea is that some highlighters reference highlighters of other filetypes (such as kakrc
using sh
highlighting). The filetype of sh
never gets set, so the hooks to load them wouldn't get triggered, and if you manually triggered a load, you would need to store that somewhere so that it doesn't attempt to load sh
again later if you open a shell file.
~If we consider my approach, we could go with only wrapping top-level evaluate-commands %sh{
blocks in hook -once
, my observation shows that those are the slowest pieces of code to execute anyway. Having top-level add-highlighters
not being lazy loaded is acceptable in my mind, and solves this dependency problem that you mention.~
Realized just after posting that I'm wrong, the issue is totally not solved.
This would work for most of the languages, but c-family.kak
has a lot of it's highlighters added within those top-level evaluate-commands %sh{}
blocks.
Different idea (untested), what if we make the syntax file explicitly initialize its dependencies in this very brutal way (example for kak syntax):
hook global BufCreate (.*/)?(kakrc|.*.kak) %{
set-option buffer filetype sh # set shell syntax for a moment to force load it
set-option buffer filetype kak # finally set the correct syntax file
}
Solving dependencies is an interesting problem, @mawww you also brought it up earlier when describing modules system and I completely missed the point because I didn't realize we had such dependencies in the first place — now thinking about it, of course it's obvious, even kakrc
depends on sh
😕
Not sure how you feel about the approach above with setting filetype
multiple times, it looks very hacky, but kinda offloads the job of running "-once" hooks to kakoune — any other approach will probably be more intrusive and require us writing new code to conditionally source files like @laelath did in his PR.
One option could be to change the behavior of source
to be more like python's import, ie have kakoune check if the file has already been sourced. I agree that the filetype changing, while it would work, is pretty hacky, and runs the risk of triggering user hooks unexpectedly.
@maximbaz @laelath There are different approaches I have in mind for that dependency problem:
source -require file.kak
which would source that file only if it was not sourced yet. It would probably require to look for the file in some set of paths, as we cannot really guarantee where the script is going to be, and we likely want to permit overriding a bundled script (by sourcing it in advance)
Some similar source -special-switch
mechanism that does only the path lookup part of that previous command, combined with a module <identifier>
command that is expected to be put at the beginning of a file and stops sourcing if that identifier was already seen.
Option 2. requires us to read the script again on each sourcing, whereas 1. does not, but 2. makes it easier to override a script without needing files to match their name (so we can even override a module directly in the user kakrc)
In both cases, a script that requires another script can just do that special source
command call to ensure that script is loaded, so its mostly in how we prevent multiple loading that they differ.
Option 2 seems a bit more kak-y to me, and the extra cost is most likely negligible. However there's still the question of the detection hooks for the language files, which need to be auto-defined so they can trigger the loading of the language highlighting and hooks. Maybe the module command could have behavior like this?
hook global BufCreate (.*/)?(kakrc|.*.kak) %{
load-module kakrc
set-option buffer filetype kak
}
module kakrc %{
load-module sh
... definitions ...
}
This is more of a solution for lazy-loading without a central coordinating file than the dependency issue.
This is a surprisingly complex feature to add.
Edit: Thinking about it, it would probably just be better to have languages define two files, one for hooks that gets autoloaded, and one for definitions that doesn't, and use the sourcing method above.
Install
kakoune
on Arch Linux from the official repo.Run
kak -n
— kakoune starts instantly (perfect). Runkak
— kakoune starts noticeably slowly, even though I didn't even create~/.config/kak/
folder yet. I useneovim
with lots of plugins and huge config file, and it starts faster thankak
.Given that I open and close editor many times during the day, it is essential to have the editor start as fast as possible.
I'm not sure how I can provide more information to narrow down the cause of slowness. What does
kak
do, thatkak -n
doesn't?