Sorting speed for large collections

clemera commented 4 years ago

Sorting is nice but can be slow for large collections. Maybe a threshold could be added to decide when to stop sorting. Another more radical idea I had was to just show the history candidates on first invocation (with default and input element moved to front) and then start sorting after input arrives and the collection already is refined to a smaller size. Having the minibuffer bump up immediately would give this approach a snappy feeling, I guess.

raxod502 commented 4 years ago

Yeah, a threshold is the most obvious idea but unfortunately I think that's a non-starter by itself, because it's critically important to show the recently selected candidates at the top even before anything has been typed.

Special-casing the first invocation to show the history candidates is I think the most promising approach, but it needs to be done with care so that it's robust and not a gigantic hack. In particular, you can't just show the top of the history, because history elements may no longer actually be candidates. (This will happen all the time in filename completion under Selectrum, for example.) And then you're going to have to actually do the sorting properly at some point. How do you decide when?

I would be very happy to see an implementation proposal that addresses these concerns, as I have been tossing them around for some time without being able to come up with anything clean.

raxod502 commented 4 years ago

Oh yes, another consideration: you might have to sort the candidates initially anyway, if there aren't enough history elements to fill the minibuffer. Another special case to deal with.

clemera commented 4 years ago

Yeah, file completion has to be special cased, for most other commands it should hopefully work (at least when they define a history variable, describe-function and describe-variable don't which seems like a BUG). We could also just define a history variable on the fly instead of falling back to minibuffer-history. I already do that in my config.

And then you're going to have to actually do the sorting properly at some point. How do you decide when?

I don't know, maybe as soon as not enough history elements matches are left for display. This would also cover the case where there aren't enough history elements to begin with. Then sorting would have to be done right away. When the filtering is fast and we do the sorting afterwards the shrinked size might improve the sorting speed significantly. But I'm just guessing no idea if it will actually work out.

raxod502 commented 4 years ago

file completion has to be special cased

I would strongly prefer not to do that. Special-casing individual commands usually does not end well in the long run. In this case, it will introduce a number of subtle bugs related to things like commands being presented in M-x that aren't actually runnable because the libraries that define them haven't yet been loaded in the current Emacs session. We should solve the problem in general, so that the behavior of Selectrum remains exactly the same to the end user, just hopefully with better performance.

clemera commented 4 years ago

Special-casing individual commands usually does not end well in the long run. In this case, it will introduce a number of subtle bugs related to things like commands being presented in M-x that aren't actually runnable because the libraries that define them haven't yet been loaded in the current Emacs session. We should solve the problem in general, so that the behavior of Selectrum remains exactly the same to the end user, just hopefully with better performance.

Good point, I don't have a good answer to this.

clemera commented 4 years ago

I also heard that is not an issue for some people, maybe I need to get a faster machine :running_man:

clemera commented 4 years ago

Caching would work for static collections. We could then do insertion sort on them, which should be faster for almost sorted collections but that would have to be done in Elisp which slows it down again.

raxod502 commented 4 years ago

maybe I need to get a faster machine

This certainly would solve the problem, but I don't think there's a good reason why selecting an item from a list should require a high-performance machine. We should improve the experience for everyone, not just those who can afford high-end hardware.

but that would have to be done in Elisp which slows it down again

Yeah. Turns out that using C functions usually outweighs considerations of algorithmic complexity. Can't wait for gccemacs.

clemera commented 4 years ago

Can't wait for gccemacs.

Me, too! I watched the presentation and I'm eager to try it soon.

condy0919 commented 3 years ago

I found that describe-variable is 5x slower than ivy even selectrum-should-sort-p is nil

;; The default selectrum
(benchmark 1 '(call-interactively #'describe-variable))
;;=> "Elapsed time: 0.964549s"

;; selectrum with nil `selectrum-should-sort-p'
(benchmark 1 '(let ((selectrum-should-sort-p nil))
                (call-interactively #'describe-variable)))
;;=> "Elapsed time: 0.804663s"

;; ivy-mode
(benchmark 1 '(call-interactively #'describe-variable))
;;=> "Elapsed time: 0.169659s"

minad commented 3 years ago

@condy0919 Thank you for the numbers!

See also #312. @oantolin, I think this is the slowdown you observed.

condy0919 commented 3 years ago

:+1: The ultimate response time

minad commented 3 years ago

@condy0919 Well, I am just an interested contributor here, not one of the maintainers ;)

clemera commented 3 years ago

describe-variable is slower with Selectrum for some reason. describe-function is about equally fast in ivy and selectrum.

oantolin commented 3 years ago

Thanks for the benchmark @condy0919! I care about the case where the input is fed programmatically, because that's what Embark does. Selectrum is clearly slower than default completion or icomplete (I ran 10 iterations because there is a lot of variation):

(benchmark 10 '(let ((unread-command-events '(?n ?i ?l 13)))
                 (call-interactively #'describe-variable)))

"Elapsed time: 17.478012s (1.220557s in 6 GCs)" ; embark-live-occur-after-input
"Elapsed time: 17.716527s (1.272396s in 7 GCs)" ; embark-live-occur-after-delay
"Elapsed time: 17.901300s (1.140958s in 6 GCs)" ; default tab completion
"Elapsed time: 19.553066s (1.297312s in 7 GCs)" ; icomplete-vertical
"Elapsed time: 19.683581s (1.308557s in 7 GCs)" ; icomplete
"Elapsed time: 21.154784s (1.519785s in 8 GCs)" ; ivy
"Elapsed time: 27.062260s (1.590118s in 9 GCs)" ; selectrum
"Elapsed time: 27.597704s (1.710274s in 9 GCs)" ; selectrum-should-sort-p = nil

Note that Embark's completion UIs are essentially just default tab completion plus popping up a buffer after you type a bit or after a delay, so for this benchmark they should be exactly the same as default tab completion, and they are. Also note that for Selectrum is slow whether I ask it to sort or not.

oantolin commented 3 years ago

I think it may not be sorting that's slowing down C-h v in Selectrum, but rather the GC.

EDIT: @minad's right, I don't why I said this. I think because I saw it triggered GC more times.

minad commented 3 years ago

@oantolin but there is not significantly more time spend in gc? Could you also benchmark allocated memory?

oantolin commented 3 years ago

Oh, you are right @minad! I think I only said the GC thing because I saw more GCs, but the total time is not much larger. I added ivy to the results I quoted: it's a little slower than icomplete but faster than selectrum, which matches my expectations, I guess.

How do I benchmark allocated memory?

minad commented 3 years ago

How do I benchmark allocated memory?

Sorry, I don't know. Probably requires running the profiler. Alternatively one can look at the gc statistics output.

But since the GC times are small I don't expect you would find something. It seems selectrum is performing even better since it has only few GCs more but a significantly longer runtime.

The problem is neither sorting nor GC/allocations.

oantolin commented 3 years ago

The problem is neither sorting nor GC/allocations.

Right, and it seems to be specific to describe-variable. I tried a similar thing with execute-extended-command all the completion UIs I tried above (including now ivy) take basically the same amount of time!

(benchmark 10 '(let ((unread-command-events '(?p ?w ?d 13)))
                 (call-interactively #'execute-extended-command)))
"Elapsed time: 22.527526s (0.607186s in 3 GCs)" ; default tab completion
"Elapsed time: 23.188627s (0.873425s in 4 GCs)" ; ivy
"Elapsed time: 23.383487s (0.667524s in 3 GCs)" ; icomplete
"Elapsed time: 23.775664s (1.031111s in 5 GCs)" ; selectrum

minad commented 3 years ago

~From the GCs it looks as if selectrum is producing more garbage. Which is cheap such that the runtime doesn't go up significantly. But more GCs are needed.~ (this is probably not correct for the emacs ms gc)

Did you also look at profiles? Maybe that would help? But I guess @clemera or @raxod502 also profiled selectrum many times...

clemera commented 3 years ago

I did not profile Selectrum until now, I was mostly busy with working on the general UI and such. I don't know where the slowdown for describe-variable is coming from so this might be the time to do finally do some profiling :)

minad commented 3 years ago

@clemera I did some profiling when I wrote consult-line. selectrum did well there. Most of the time is spent in consult--line-candidates, but that was okay after I did some basic optimizations.

But would be interesting to profile describe-variable, since this seems to hit some edge case.

clemera commented 3 years ago

Yes, after some digging I found the reason is the predicate passed by describe-variable which does switch the current buffer for each candidate. In Selectrum we compute the candidates in post command hook when the minibuffer is already the current buffer so the buffer needs to be adjusted for each candidate, Ivy does this before the minibuffer is entered so the buffer can stay the same.

clemera commented 3 years ago

The reason we compute them late is to behave more like default minibuffer completion, for example with embark you wouldn't want to compute the candidates after injection.

minad commented 3 years ago

Is there a chance to get this fixed? You are running everything in the temporary buffer context afaik. We talked about this before where I wondered about the context of the annotations l functions I believe. It should be such that selectrum can efficiently add to the temporary buffer but the functions passed to completing-read should be executed in the proper context.

minad commented 3 years ago

You can still do the late computation but just put the loop in the right context if that's possible?

clemera commented 3 years ago

You are running everything in the temporary buffer context afaik.

Only when computing the display of candidates the computation of the candidates happens with the minibuffer current like with default completion.

clemera commented 3 years ago

You can still do the late computation but just put the loop in the right context if that's possible?

I think we would need to special case help--symbol-completion-table because usually the right context should be the minibuffer.

minad commented 3 years ago

I think we would need to special case help--symbol-completion-table because usually the right context should be the minibuffer.

Are you sure? I think you are currently executing in the temporary buffer? So I would change it to the original buffer if this is what the default completion does.

minad commented 3 years ago

Maybe this special casing is only needed for this bad edge case since selectrum is special since it computes everything at once. You will know better I guess 👍

clemera commented 3 years ago

We only do that for display purposes the computation of the candidates happens with the minibuffer current. I'm not sure what to do yet, I can't find any mentions of assumptions about current buffer for the predicate in the docs, if ivy gets away with it maybe we could also run the computation in the last buffer by default.

hmelman commented 3 years ago

I don't completely follow this but took a look at describe-variable. The predicate lambda there has the comment "In case the variable only exists in the buffer the command we switch back to that buffer before we examine the variable." Could that lambda be changed to only switch buffers if the variable isn't found in the current buffer? Would that fix the performance problem? If so then it would be worth reporting the bug and getting it fixed for Emacs 28 and having a special case in selectrum before then.

clemera commented 3 years ago

@hmelman The issue with describe-variable is fixed by #316. I decided to run it generally in the buffer that was current before, if we run into trouble with that we can still think about special casing. Regarding your suggestion I think even if the variable exists within the minibuffer, too you probably still prefer to get the local value of the buffer from where you invoked describe-variable.

hmelman commented 3 years ago

Regarding your suggestion I think even if the variable exists within the minibuffer, too you probably still prefer to get the local value of the buffer from where you invoked describe-variable.

I guess describe-variable is doing the same optimization you're doing in #316 so that makes sense.

minad commented 3 years ago

In Vertico I am using a bucket sorting/binning technique, which pushes down the sorting complexity as long as the candidates are reasonably short (which is the case for describe-variable etc). In Vertico using a more optimized sorting technique is necessary since there I am sorting on every keypress.

https://github.com/minad/vertico/blob/e6a74f84e86f3f5f1e1034f9f817a7c1fe59287f/vertico.el#L147-L196

radian-software / selectrum

Sorting speed for large collections #75