minad / marginalia

:scroll: marginalia.el - Marginalia in the minibuffer
GNU General Public License v3.0
780 stars 27 forks source link

marginalia-annotate-library is slow even on successive runs #143

Closed hmelman closed 2 years ago

hmelman commented 2 years ago

I know that marginalia-annotate-library is going to be slow, but I'm finding each time I invoke find-library with it enabled I have to wait 5-10 seconds for the list to complete (I'm not sure if this is new behavior or not). I'm no expert on elisp profiling but I tried and I think the marginalia--library-cache is getting regenerated each time.

First I wonder why it's buffer local on the minibuffer and not just a global variable, that seems unnecessary but it does look like marginalia--library-cache is switching to the active minibuffer (this of course doesn't help if you run it from *scratch*). Second I seem to see marginalia--library-name being called 2078 times even on successive runs. That suggests to me that the cache is being recreated each time and I'm not sure why (unless it's a problem with local variable). Thoughts?

I'm using the latest released versions of marginalia, vertico, and orderless on the macport of Emacs 28.1.

minad commented 2 years ago

How do you obtain these results? Can you please provide a more detailed reproduction and check with emacs -Q? With Vertico the cache is populated once per completion session. See the unless block inside marginalia--library-cache. For me the annotator is reasonably fast.

We could use a global cache but I dislike creating global memory leaks for something which is usually reasonably fast to regenerate once per completion session. If the library annotator does not work for you for whatever special reason, I would suggest to just disable it. For me it works well, but my Emacs is also a rather slim installation with about 50 external packages only.

marginalia--library-name is a cheap function which is not related to the library cache.

minad commented 2 years ago

To investigate this further, you can trace/profile marginalia--library-cache. Then you should see that the first call per completion session is slow. In my installation this call doesn't take more than 1s or 2s. If your installation is three times larger than mine, the values you observe are not too unreasonable.

hmelman commented 2 years ago

So I did it via elp and it's roughly the same in vanilla gnu emacs -q (still several seconds):

(setq elp-function-list '(marginalia-annotate-library marginalia--library-cache marginalia--library-name marginalia--library-doc))
M-x elp-instrument-list RET nil RET
M-x find-library (pick one or C-g out, same effect)
M-x elp-results

This is the result after one run:

Function Name            Call Count      Elapsed Time  Average Time
marginalia-annotate-library  10          3.723285      0.3723285
marginalia--library-doc      9           3.6895809999  0.4099534444
marginalia--library-cache    10          0.0329900000  0.0032990000
marginalia--library-name     1376        0.0168130000  1.221...e-05

This is the results after two runs:

Function Name            Call Count      Elapsed Time  Average Time
marginalia-annotate-library  21          7.4136429999  0.3530306190
marginalia--library-doc      18          7.3570389999  0.4087243888
marginalia--library-cache    21          0.0551150000  0.0026245238
marginalia--library-name     2753        0.0227080000  8.248...e-06

The second run takes just a long clock time as the first. As I said, I could be mistaken but I don't see why marginalia--library-name is called another 1376 times on the second call. That to me indicates it's being called from marginalia--library-cache inside the dolists so it's rebuilding the cache. And if I put a breakpoint there it's triggered every run:

    (unless marginalia--library-cache
      (debug)

I saw "the unless block inside marginalia--library-cache" and didn't see what would be wrong about it, hence the two points in my original report.

(I think you closed this prematurely).

minad commented 2 years ago

Your issue is that marginalia--library-doc takes around 0.4s per run! This is unacceptably slow. On Linux these are my results:

marginalia-annotate-library  10          0.0439883600  0.0043988360
marginalia--library-doc      10          0.026466687   0.0026466687
marginalia--library-cache    10          0.0161018600  0.0016101860

All the functions are called 10 times since we compute 10 annotations with Vertico in the default settings. The cache is regenerated every minibuffer session, which is not a problem given these numbers. We only keep the cache around in the minibuffer to avoid recomputing the annotations multiple times during a single session. I don't want to keep a global cache around for longer to not unnecessarily occupy memory. If a global cache would be a necessary prerequisite for the annotator as it seems to be in your case, I wouldn't have added the annotator in the first place, since the first run would still be prohibitively slow even if subsequent runs would be fast.

The question here is this - why is marginalia-annotate-library so horribly slow on your system? The shelling out is likely the cause:

https://github.com/minad/marginalia/blob/03fa4655ecd99310a51632e285bf961476b789bc/marginalia.el#L995-L999

I can only take a wild guess why this happens. It could be that when shelling out, a very expensive shell is launched which executes an expensive .zsh_profile. I've found this old post https://www.reddit.com/r/emacs/comments/4axec4/why_my_shellcommand_or_shellcommandtostring_will/ . Furthermore I recall that @jdtsmith had the same issue with Vertico+Tramp which also shells out very often. See https://www.reddit.com/r/emacs/comments/ps6ulg/is_your_tramp_completion_very_slow_on_macos/ and https://lists.gnu.org/archive/html/tramp-devel/2021-09/msg00002.html. To work around this particular issue, you could adjust shell-file-name and set it to /bin/sh.

jdtsmith commented 2 years ago

In the tramp case on MacOS the issue was that a command that doesn’t invoke a shell on Linux does on the Mac, and tramp file name completion was calling it thousands of times (unnecessarily). But yes, sh was much faster to startup as a temporary workaround.

hmelman commented 2 years ago

TL;DR: My .zshenv had a call to nvm.sh that slowed it down, moving it to .zshrc solved it

A month ago I switched from macOS 11 (Big Sur) to macOS 12 (Monterey). I had delayed that switch in part because of the vfork changes that made emacs (particularly magit) slow and finally made the switch because I thought emacs dealt with it. I also finally changed my login shell to zsh (the new default as of OSX 10.15 Catalina) so it could have been either of these things.

This made marginalia-annotate-library perform adequately:

(setq shell-file-name "/bin/bash")

I didn't want to permanently set shell-file-name to bash as it's an old version on macOS and I wanted my emacs spawned shells to be the same as my login shells, but this proved zsh was the issue and this quantified it:

% time  bash -i -c exit
exit
bash -i -c exit  0.00s user 0.00s system 68% cpu 0.005 total

% time  zsh -i -c exit 
Restored session: Thu Jul  7 09:45:57 EDT 2022

Saving session...
...saving history...truncating history files...
...completed.
zsh -i -c exit  0.28s user 0.38s system 103% cpu 0.639 total

I didn't have a .zsh_profile (it's proper name is .zprofile) and renaming my .zshrc didn't help. I did have a .zshenv (called for all zsh invocations) that set $NVM_DIR by calling nvm.sh and moving that call to .zshrc (called only for interactive zshs) changed the timings to something adequate:

zsh -i -c exit  0.04s user 0.03s system 93% cpu 0.069 total

@minad Question on the marginalia--library-cache, I had thought a minibuffer persists between command invocations like other buffers not displayed in a window, is this not right? Is there a new minibuffer for each command that uses one? So the purpose of this cache is not to handled repeated calls to (say) find-library but to handle (e.g., vertico) minibuffer scrolling of one call to find-library?

minad commented 2 years ago

Question on the marginalia--library-cache, I had thought a minibuffer persists between command invocations like other buffers not displayed in a window, is this not right? Is there a new minibuffer for each command that uses one? So the purpose of this cache is not to handled repeated calls to (say) find-library but to handle (e.g., vertico) minibuffer scrolling of one call to find-library?

The minibuffer is reset for every minibuffer session. Also recursive minibuffers use their own buffer, therefore the names *Minibuf-0*, *Minibuf-1*, etc. The purpose of the Marginalia caches is just to speed up a single completion session (scrolling and continuous filtering), reducing the load on the Emacs GC and runtime system. The cache is less effective for the default UI but for the default UI you only display the candidates using TAB after having them already prefiltered to some extent. The purpose of the caches is not to cache data for a longer period of time to avoid occupying memory. Caches which keep the data around for longer are unnecessary, since initial generation should still be sufficiently fast. This wasn't the case in your setup but should be fixed by https://github.com/minad/marginalia/commit/4fe73f5724d1e7e61b2e0abf4a530d8a2cdadcb8.

hmelman commented 2 years ago

Thanks. I was wondering if you would let bind shell-file-name here. Note that on macOS (and perhaps other oses) /bin/sh is a link to bash which of course behaves specially if invoked by that name. There's speculation that could change to be a link to zsh at some point on mac.

minad commented 2 years ago

Note that on macOS (and perhaps other oses) /bin/sh is a link to bash which of course behaves specially if invoked by that name.

Usually /bin/sh is just the Bourne shell or a highly compliant lightweight shell. Bourne shell behavior is demanded by Posix iirc.

There's speculation that could change to be a link to zsh at some point on mac.

If they do that, they will probably ensure that /bin/sh still behaves like the normal Bourne shell, such that it starts quickly.

hmelman commented 2 years ago

If they do that, they will probably ensure that /bin/sh still behaves like the normal Bourne shell, such that it starts quickly.

Yes, though issues can pop up, https://scriptingosx.com/2020/06/about-bash-zsh-sh-and-dash-in-macos-catalina-and-beyond/

oantolin commented 2 years ago

Presumably let-binding shell-file-name to "sh" causes trouble on Windows (for Windows-native Emacs, not for WSL or cygwin).

minad commented 2 years ago

@oantolin I wonder why? sh should be looked up on the $PATH? init_callproc contains the following code:

sh = getenv ("SHELL");
Vshell_file_name = build_string (sh ? sh : "/bin/sh");

Also jkr-compr.el uses sh as shell program. But then I just saw that there is also jkr-compr-use-shell which is nil on Windows. The way our code is written relies on the shell being available and I don't think we should change this. We also rely on gzip and head.

hmelman commented 2 years ago

I don't know what the current situation on windows is, but there were reasons like this: https://www.reddit.com/r/emacs/comments/4z8gpe/using_bash_on_windows_for_mx_shell/d6wmc88/?utm_source=reddit&utm_medium=web2x&context=3

oantolin commented 2 years ago

I just meant that sh is an extra dependency that you are imposing. And it feels to me less necessary than gzip or head. But it's probably fine since Emacs users on Windows tend to install various GNU utilities anyway. For example, I install git on Windows and it comes with a bunch of standard Unix utilities, including, I believe gzip and sh (not sure about head).

minad commented 2 years ago

I just meant that sh is an extra dependency that you are imposing.

That's true. But if you have gzip and head you probably have sh too? In particular if you have git - I wonder if git works without sh, at least some of the porcelain.

oantolin commented 2 years ago

I don't know if git works without sh, but I also think it doesn't attempt to. 😉 On Windows, git comes with bash.