Closed groves closed 2 years ago
How are you getting 100ms and 50ms that does not sound correct at all. For instance running
time kitty +kitten ssh -h
gives
real 0.064 user 0.054 sys 0.010
aka 64 ms for running main and executing ssh itself. I doubt my laptop is really twice as fast as your machine. You can try running with PYTHONPROFILEIMPORTTIME=1 to see whats causing the delays.
Adding a return to line 702 so that ssh is not actually called but all processing is done gives me time kitty +kitten ssh localhost real 0.093 user 0.075 sys 0.017
Finally: time kitty +kitten ssh localhost true
real 0.132 user 0.084 sys 0.013
And python 3.11 should reduce these times by about 20%
How are you getting 100ms and 50ms that does not sound correct at all.
Indeed it wasn't correct. Thanks for catching that.
I was running with KITTY_DEVELOP_FROM set to let me change the Python code. If I don't have it set, I get more like 80 millis for a no-op ssh invocation:
groves@baladi ~ [255]> time kitty +kitten ssh -V
usage: ssh [-46AaCfGgKkMNnqsTtVvXxYy] [-B bind_interface]
[-b bind_address] [-c cipher_spec] [-D [bind_address:]port]
[-E log_file] [-e escape_char] [-F configfile] [-I pkcs11]
[-i identity_file] [-J [user@]host[:port]] [-L address]
[-l login_name] [-m mac_spec] [-O ctl_cmd] [-o option] [-p port]
[-Q query_option] [-R address] [-S ctl_path] [-W host:port]
[-w local_tun[:remote_tun]] destination [command]
________________________________________________________
Executed in 86.01 millis fish external
usr time 67.78 millis 0.37 millis 67.42 millis
sys time 14.49 millis 1.08 millis 13.41 millis
I'm guessing the frozen stuff from bypy is making a big difference there?
Without the KITTY_DEVELOP_FROM penalty, I still see a noticeable slowdown from the ssh kitten as opposed to direct ssh. Here's the kitten's timing, which is with the shared session established and is representative from several runs:
groves@baladi ~> time kitty +kitten ssh localhost true
________________________________________________________
Executed in 258.17 millis fish external
usr time 97.41 millis 0.39 millis 97.02 millis
sys time 18.91 millis 1.16 millis 17.75 millis
And here's direct ssh timing:
groves@baladi ~/.ssh> time ssh localhost true
________________________________________________________
Executed in 42.21 millis fish external
usr time 3.12 millis 365.00 micros 2.76 millis
sys time 4.12 millis 885.00 micros 3.23 millis
I'm doing things like popping a file fuzzy finder on the remote host from a kitty keybinding, so the slowness is particularly noticeable as it's in my editing flow. I'm still motivated to speed this up as a result, either via the prewarming or some other mechanism.
Well, it is always going to be slower than plain ssh+controlmaster since it has to do a lot more work both before invoking ssh and after invoking but before the remote command is executed.
You could possibly get rid of the 60ms python startup+import time by keeping a process waiting around but this seems like overkill. 250ms isnt very long when connecting to a remote host and if it is then shaving 60ms of it isnt going to help much. For your use case I would just make a dedicated mapping that runs either plain ssh or a kitten to get the controlmaster port and then run plain ssh.
Hrmm, but if we kept an ssh connection waiting around, wouldn't that eliminate the entire 250ms? That's what makes it seem worthwhile to me. I agree that it's overkill for 60ms.
On Thu, Jun 02, 2022 at 10:02:17AM -0700, Charlie Groves wrote:
Hrmm, but if we kept an ssh connection waiting around, wouldn't that eliminate the entire 250ms? That's what makes it seem worthwhile to me. I agree that it's overkill for 60ms.
You cant keep an ssh connection waiting around unless you are willing to eliminate following cwd. The best you can do is have the kitten wait just before building the tarball to send.
Thinking about it some more there is a prewarm I would be willing to add to kitty, but one that is generic for all kittens not specific to the ssh kitten. Basically, the idea would be to keep a background process that finishes all python initialization and imports most commonly used modules and then waits. When running kittens kitty will then pass the kitten envvars, cli, cwd and stdin to this waiting process and it will run the kitten by forking. This avoids the entire python initializationa nd import time overhead for all kittens.
The slightly tricky part would be to successfully replace the stdio handles in the already initialized python. EDIT: Thankfully, os.dup2() works fine for this.
If you want to help with this let me know.
Sure, I'd be interested. Would you like me to take a swing given your general idea of swapping in a prewarmed python with dup2, or would you like to see more details before I throw some code together?
No need I have already mostly implemented it in master. Feel free to kick the tires once its done.
prewarming for all kittens is now fully implemented in the prewarm branch. It works when runnign any kitten via keybindings or remote control. Note the very first time you run a kitten in a given kitty instance there wont be any benefit, after that prewarming will kick in.
And I have now removed the limitation that the first kitten invocation is not prewarmed. It's now merged into master and will be in the next release.
I piped kitty --debug-keyboard --dump-commands
into a script that prepends each line with the process time in nanos. In the spawned kitty, I hit the "new window" keyboard shortcut in a window running the ssh kitten. I then grabbed the time for handled as shortcut
being printed and the time for the final draw command for my prompt on the remote host.
That took 166 millis on average with the kitty nightly and 202 millis in the released version. 36 millis faster, so seems like the prewarming is working to me. Thanks for implementing it!
Do I understand correctly that for my custom kittens, they'll use a prewarmed process if launched via shortcut or remote control but that it won't have their custom code imported? They seem to be working fine, just making sure that they're actually affected and worth testing as part of this.
Anything else you'd like me to try?
On Tue, Jun 14, 2022 at 02:25:13PM -0700, Charlie Groves wrote:
I piped
kitty --debug-keyboard --dump-commands
into a script that prepends each line with the process time in nanos. In the spawned kitty, I hit the "new window" keyboard shortcut in a window running the ssh kitten. I then grabbed the time forhandled as shortcut
being printed and the time for the final draw command for my prompt on the remote host.That took 166 millis on average with the kitty nightly and 202 millis in the released version. 36 millis faster, so seems like the prewarming is working to me. Thanks for implementing it!
You're welcome, it was a fun technical challenge.
Do I understand correctly that for my custom kittens, they'll use a prewarmed process if launched via shortcut or remote control but that it won't have their custom code imported? They seem to be working fine, just making sure that they're actually affected and worth testing as part of this.
Yes, that's correct but importing one or two python modules is not slow, should be sub millisecond.
Anything else you'd like me to try?
Well I am working on an implementation of prewarming that also works for kittens launched at the shell inside kitty (think of icat or the diff kitten). If that works out, I will post here.
The prewarm implementation for kittens at the shell is now in master.
@kovidgoyal I gave this a quick test.
kitty @ ls | less
qq
and (END)qq
is displayed.fish: Unknown command: q
after a carriage return, so it looks like less has exited on the first q.How to fix this?
Would you mind adding KITTY_PREWARM_SOCKET_REAL_TTY
to glossary.rst
?
Should be fixed by: f637bf2
... In theory we might need to check STDIN as well but that is very unusual.
I noticed that the beginning of the remote control tutorial mentions ls | kitty @ send-text --stdin
, and the following does not work properly.
ls | fzf | kitty @ send-text --stdin --match ...
I doubt that would work without socket prewarming either. If it does it would be very accidental. In general you cannot run multiple programs in a pipeline that expect to do terminal IO. Here both fzf and kitty @ are expecting to do terminal IO, simultaneously. Indeed, your initial example of kitty @ ls | less only works because less doesn't do much until it starts receiving data and kitty @ ls only sends data after it has finished doing terminal IO. In such cases you can always do
whatever | KITTY_PREWARM_SOCKET= kitty @ send-text
And with this commit: https://github.com/kovidgoyal/kitty/commit/609d42e2bca742613ccdccbd2152e835cd8a19ee on my linux box at least even using fzf works, but I should emphasize, this is very much a hack, you cant really have multiple programs thinking they own the tty at the same time.
... you cant really have multiple programs thinking they own the tty at the same time.
Thanks for the explanation, these two programs will work simultaneously and will cause problems.
In this case, fzf is not aware of the existence of kitty @
.
If a program that owns tty wants to call another program that does some terminal io, it should be handled properly.
For fzf, I guess the following will work.
ls | fzf --bind 'ctrl-y:execute(printf {} | kitty @ send-text --match recent:1 --stdin)'
@kovidgoyal
I noticed that when kitty.app is launched by OS (e.g. Launchpad), kitty +list-fonts
crashes.
kitty.app is built with debug and sanitize arguments.
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_kernel.dylib 0x7ff81600cdba __abort_with_payload + 10
1 libsystem_kernel.dylib 0x7ff81600e877 abort_with_payload_wrapper_internal + 80
2 libsystem_kernel.dylib 0x7ff81600e827 abort_with_reason + 19
3 libobjc.A.dylib 0x7ff815ed9aee _objc_fatalv(unsigned long long, unsigned long long, char const*, __va_list_tag*) + 114
4 libobjc.A.dylib 0x7ff815ed9a7c _objc_fatal(char const*, ...) + 138
5 libobjc.A.dylib 0x7ff815ecffa0 performForkChildInitialize(objc_class*, objc_class*) + 299
6 libobjc.A.dylib 0x7ff815ebb177 initializeNonMetaClass + 617
7 libobjc.A.dylib 0x7ff815ebaf6a initializeNonMetaClass + 92
8 libobjc.A.dylib 0x7ff815ebac18 initializeAndMaybeRelock(objc_class*, objc_object*, mutex_tt<false>&, bool) + 232
9 libobjc.A.dylib 0x7ff815eba995 lookUpImpOrForward + 1087
10 libobjc.A.dylib 0x7ff815ebd53d object_setClass + 88
11 CoreFoundation 0x7ff816070d78 _CFRuntimeCreateInstance + 544
12 CoreText 0x7ff8177a67a0 TCFRef<CTFontDescriptor*> TCFBase_NEW<CTFontDescriptor, __CFDictionary const*&>(__CFDictionary const*&) + 26
13 CoreText 0x7ff8177a674e CTFontDescriptorCreateWithAttributes + 48
14 CoreText 0x7ff8177d00d0 CTFontCollectionCreateFromAvailableFonts + 91
15 fast_data_types.so 0x1125c9675 all_fonts_collection + 21 (core_text.m:159)
16 fast_data_types.so 0x1125ca026 coretext_all_fonts + 22 (core_text.m:165)
17 Python 0x11016f446 cfunction_vectorcall_NOARGS + 101
18 Python 0x11021bc75 call_function + 158
...
84 Python 0x110289921 pymain_run_module + 212
85 Python 0x1102894a9 Py_RunMain + 974
86 kitty 0x10fc1a6db run_embedded + 3627 (main.c:201)
87 kitty 0x10fc19412 main + 1954 (main.c:292)
88 dyld 0x11902352e start + 462
kitty(8137,0x10f439600) malloc: nano zone abandoned due to inability to preallocate reserved vm space.
objc[8139]: +[UIFontDescriptor initialize] may have been in progress in another thread when fork() was called.
objc[8139]: +[UIFontDescriptor initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
zsh: abort kitty +list-fonts
Also, due to recent changes, make
again after kitty has been built will prompt an error.
...
File ".../kitty/setup.py", line 1257, in create_minimal_macos_bundle
os.symlink(
FileExistsError: [Errno 17] File exists: 'kitty.app/Contents/MacOS/kitty' -> 'kitty/launcher/kitty'
That crash implies that the forking of the prewarm process is happening after cocoa starts initializing, but if you see main.py the fork is done before glfw (and therefore any cocoa frameworks) is even loaded. Very weird. Does launch services somehow preload cocoa when running applications?
This should fix it: ...
I compared show_kitty_env_vars
and forgot that LANG
has been modified under macOS, just as PATH
has been. That's why I couldn't find the difference, before digging into main.py
.
Running the command kitty/launcher/kitty +kitten/+runpy/...
in the development directory shows that the code executed is from the current running kitty terminal. The reason for this is perhaps not so obvious for developers unfamiliar with kitty. Is it necessary to document in the dev page that KITTY_PREWARM_SOCKET
should be unset (set to empty)? Hopefully that will reduce the number of issues related to this.
On Sun, Aug 14, 2022 at 10:49:46PM -0700, page-down wrote:
This should fix it: ...
I compared
show_kitty_env_vars
and forgot thatLANG
has been modified under macOS, just asPATH
has been. That's why I couldn't find the difference, before digging intomain.py
.Running the command
kitty/launcher/kitty +kitten/+runpy/...
in the development directory shows that the code executed is from the current running kitty terminal. The reason for this is perhaps not so obvious for developers unfamiliar with kitty. Is it necessary to document in the dev page thatKITTY_PREWARM_SOCKET
should be unset (set to empty)? Hopefully that will reduce the number of issues related to this.
You are welcome to send a PR for this. Though there isnt a dev page as such. Maybe in CONTRIBUTING.md
Is your feature request related to a problem? Please describe. It takes about 200 milliseconds to ssh into a host I'm using with the ssh kitten. That's the average value I get running
time kitty +kitten ssh jojr true
. I get an average of ~30 millis fromtime ssh jojr true
, which also has ControlMaster on. That additional 170 millis is a noticeable pause when I pop open a new window withlaunch --cwd=current
or similar commands.Describe the solution you'd like When the ssh kitten finishes connecting, it launches another ssh kitten in the background. The next request for an ssh kitten to that host uses that already connected kitten. That should make it appear that connecting is instantaneous. Once that connection is in use, another background kitten could be launched to prewarm for the next request.
I'd think this should be behind an option and off by default. It would mean that any files or configuration that have changed since the previous invocation that would affect the kitty ssh bootstrap would be out of date, and I think users would want to be aware of that before enabling this. It would also need a way to update the cwd of the remote connection right before handing over the prewarmed connection since that will likely have changed since the bootstrap was run.
Describe alternatives you've considered I did some profiling of the ssh kitten to see if I could speed it up enough to make it not an issue. I added time.monotonic calls to the start of execution in ssh/main.py and to right before ssh/main.py executes ssh. Those showed that it took 100 millis of Python execution to get to the start of ssh/main.py's code and another 50 millis to get to executing ssh. Since over half the additional time taken by the ssh kitten over raw ssh was in Python startup and importing, I didn't think I'd be able to optimize the ssh kitten enough to make a difference.
I've looked at using ssh directly myself, but that means reimplementing the cwd support of launch as far as I can tell.
Additional context If this sounds like a reasonable thing to add, I'd be happy to start on a PR for it. I'm also happy to work on other approaches if there's a better way to go about making additional ssh connections fast.