Closed liushihao456 closed 1 year ago
This is why I do not personally use fuzzy matching in prescient.el
.
I think it is unlikely to be possible to solve this problem in an elegant way. How would you want to prioritize matches? First by whether they have a substring match, then by frecency? Then you would end up with a bunch of infrequently used candidates before the recent ones, just because they happen to be substring matches.
I think you need some kind of heuristic to trade off between the different factors. Unfortunately, one of the design goals of prescient.el
is that the sorting and filtering algorithms are as simple as possible and never have any nondeterminism, so I do not think it would be a good idea to add that.
Have you considered more heuristic-based incremental narrowing frameworks such as https://github.com/lewang/flx?
There's also the built-in completion style flex
, if I understand the description given for flex-score-match-tightness
:
Controls how the
flex
completion style scores its matches.Value is a positive number. A number smaller than 1 makes the scoring formula reward matches scattered along the string, while a number greater than one make the formula reward matches that are clumped together. I.e "foo" matches both strings "fbarbazoo" and "fabrobazo", which are of equal length, but only a value greater than one will score the former (which has one large "hole" and a clumped-together "oo" match) higher than the latter (which has two "holes" and three one-letter-long matches).
However, since this modifies the sorting metadata of the candidates, it wouldn't use the prescient.el
sorting.
I think it is unlikely to be possible to solve this problem in an elegant way. How would you want to prioritize matches? First by whether they have a substring match, then by frecency? Then you would end up with a bunch of infrequently used candidates before the recent ones, just because they happen to be substring matches.
What I meant is to sort only the remaining matches after moving frequent/recent candidates to the front. Frequency/recency has a higher priority. Just like the behavior of the default sorting by length.
Therefore the logic happens just where sorting by length happens. We might provide several sorting methods, prescient-sort-by-length
, prescient-sort-by-fuzzy-score
, etc.. Or maybe provide a sorting function variable, prescient-sorting-method
that defaults to prescient-sort-by-length
, but would allow for customization.
Unfortunately, one of the design goals of prescient.el is that the sorting and filtering algorithms are as simple as possible and never have any nondeterminism, so I do not think it would be a good idea to add that.
I understand. And I really like the way prescient.el (and selectrum.el) is written. It's a personal preference to rely on fuzzy matching. I can just type on without considering adding spaces.
If it's not a good idea to add it, could you enlighten me how I can customize it to achieve my goal? Maybe by adding some advice somewhere?
There's also the built-in completion style
flex
, if I understand the description given forflex-score-match-tightness
:Controls how the
flex
completion style scores its matches. Value is a positive number. A number smaller than 1 makes the scoring formula reward matches scattered along the string, while a number greater than one make the formula reward matches that are clumped together. I.e "foo" matches both strings "fbarbazoo" and "fabrobazo", which are of equal length, but only a value greater than one will score the former (which has one large "hole" and a clumped-together "oo" match) higher than the latter (which has two "holes" and three one-letter-long matches).However, since this modifies the sorting metadata of the candidates, it wouldn't use the
prescient.el
sorting.
I have tried it. No matter what I set for flex-score-match-tightness
, the candidates just seem to be sorted by length. Both 0.7 and 3 produce the following result:
My config:
;; Remove the original prescient-relevant config
;; Then
(setq completion-styles '(flex basic))
(setq flex-score-match-tightness 0.7) ; or 3
I still prefer prescient.el because I'm able to toggle fuzzy filtering via M-s f
, or split the search string with spaces to narrow down further, etc..
If it's not a good idea to add it, could you enlighten me how I can customize it to achieve my goal? Maybe by adding some advice somewhere?
If you would like to experiment with Selectrum Prescient, look at the function selectrum-prescient--refine
. How selectrum-prescient-mode
currently works is:
selectrum-preprocess-candidates-function
.selectrum-refine-candidates-function
.prescient-sort-full-matches-first
is applied.You could try using a custom refinement function.
With your help, I came up with the following snippet using flx.el:
(defvar prescient-flx-threshold 500)
(require 'flx)
(defun my/prescient-filter-flx-advice (fn query candidates)
(let ((results (funcall fn query candidates)))
(if (and results
(not (string-empty-p query))
(< (length results) prescient-flx-threshold))
(let* ((queries (prescient-split-query query))
(matches (mapcar (lambda (item)
(cons item
(apply '+
(mapcar
(lambda (q)
(car (or
(flx-score item q flx-file-cache)
'(-100))))
queries))))
results)))
(setq matches (sort matches (lambda (x y) (> (cdr x) (cdr y)))))
(mapcar (lambda (x) (car x)) matches))
results)))
(advice-add #'prescient-filter :around #'my/prescient-filter-flx-advice)
It will sort the candidate list after prescient-filter
. The sorting logic of flx works reasonably well according to my experiment. My input will bring the intended candidate to the front most of the times.
However, my elisp is kinda weak, so any suggestions on possible improvements will be appreciated. :)
Oh, if you just want a different sorting order to replace the sorting by length that occurs at the end, that seems pretty reasonable. I didn't think it would be terribly helpful just because usually the length sorting doesn't make much of a difference for me (most of the candidates are previous selections), but if you find it coming up frequently, that seems quite reasonable. I don't see why we can't just add a user option that allows you to provide your own key function to use instead of length at the end of the chain.
This thread is being closed automatically by Tidier because it is labeled with "waiting on response" and has not seen any activity for 90 days. But don't worry—if you have any information that might advance the discussion, leave a comment and I will be happy to reopen the thread :)
I don't see why we can't just add a user option that allows you to provide your own key function to use instead of length at the end of the chain.
I see this issue has been closed, but has this been implemented? Currently, when I type something like spull
I get straight-push-all
before straight-pull-all
. Like as shown in the screenshot in https://github.com/radian-software/prescient.el/issues/128#issuecomment-1256937095, I'd like to sort matches with more matching contiguous characters higher, and sort matches that occur earlier in the string higher too. Thanks
No, there were no changes made here to my knowledge. Just to clarify, you'd be fine with providing your own scoring function to use in place of length
after recency information is taken into consideration? Or did you have a different mechanism in mind?
I was under the assumption that work is needed to allow this. A user option needs to be added?
Yes, work would be needed to allow this. I just wanted to clarify your use case before embarking on work to implement that work.
Oh, sorry. I misunderstood. Yes, I'd be fine with supplying a scoring function of my own
@carrete How's https://github.com/radian-software/prescient.el/pull/151?
@raxod502 Thank you very much for taking the time to put this together! Would it be possible to pass the user's input to the tiebreaker function too? For example, given two candidates, "foobar-stuff" and "stuff-foobar", and the user's input of "foobar", I would want to return "foobar-stuff" as the preferred candidate since it includes the user's input closer to the start of the string. There are other criteria I'd like to use too, this is just one example. Thanks again!
That might be minibuffer-contents
.
True, but the way of accessing the user's search query might depend on the UI framework. Better that prescient.el
provides something that is guaranteed to work in every environment, since it already has access to the data. Try the latest commit on the PR, you can use the prescient-query
variable now in your tiebreaker function.
I'm having a heck of a time getting this to work. I've reinstalled ~/.emacs.d from scratch to be sure that I'm not getting tripped up by some previous historical preference for unwanted match candidates (and that I'm using the same package versions as specified by radian). However, the results are still wonky.
What I've come up with so far looks like:
(radian-local-on-hook after-init
(defun tvaughan/prescient-tiebreaker (candidate1 candidate2)
(let ((index1 (cl-search prescient-query candidate1))
(index2 (cl-search prescient-query candidate2)))
(if (and (integerp index1) (integerp index2))
(cond ((< index1 index2) -1)
((> index1 index2) 1)
(t (cond
((< (length candidate1) (length candidate2)) -1)
((> (length candidate1) (length candidate2)) 1)
(t 0))))
(cond
((integerp index1) -1)
((integerp index2) 1)
(t 0)))))
(setq prescient-filter-method '(literal fuzzy regexp))
(setq prescient-persist-mode t)
(setq prescient-sort-length-enable t)
(setq prescient-tiebreaker #'tvaughan/prescient-tiebreaker)
...)
When I enter M-x m
and M-x ma
, man
is the top candidate as expected, but
when I enter M-x mag
I get:
magit
should be the top candidate. magit
is the top candidate if I remove fuzzy
from prescient-filter-method
but I depend on fuzzy matching almost exclusively. This is the sort of behavior I've always seen no matter what tweaks I made. The behavior is the same no matter what prescient-sort-length-enable
is set to, for example. I've never been able to get the candidate with the most contiguous matching characters to be the top candidate after a couple of keystrokes. For example, if I unset the tiebreaker function I get:
Also please note these top candidates are ones I have never selected before. Perhaps there's some fundamental concept about this that I still don't understand?
I'm using the same package versions as specified by radian
Except for prescient.el
, which is running from https://github.com/radian-software/prescient.el/pull/151, right?
Yes, sorry that wasn't clear.
{tvaughan@Patagonica} ~/.emacs.d/straight/repos/prescient.el[rr-tiebreak] %
$ git log --oneline | head -n5
9548bec Grant access to original search query
73ade3d [#128] Custom tiebreaking function
e0cca29 Fix and improve sorting full matches first under some conditions. (#149)
d7cc55d Only add properties for `prescient-sort-full-matches-first` to the first candidate. (#148)
a402a7e Add unit tests (#150)
You're looking for (setq prescient-sort-full-matches-first nil)
. Otherwise you'll get candidates sorted first that fully match the input (which, with fuzzy
, will include a lot more than you might anticipate).
This tripped me up for a while trying to answer your question.
Thank you, @raxod502. With the tiebreaker function unset, this works exactly like my preferred behavior.
I can still see where I might want to use the tiebreaker function, but if you would prefer to not merge this (not that my opinion should carry any weight), I won't be negatively impacted. Thanks again
Ah, great to hear! Yeah, that interaction did not occur to me before. Anyway, the code and tests are already written, so I'll go ahead and just merge the feature. Maybe somebody else will find it helpful.
Thanks again, @raxod502
I have always thought there is a sorting mechanism that sorts fuzzy-matched candidates, until today. Reading the "Algorithm" section carefully again, I understand now that the sorting is done by frequency, recency and length.
I think it would be nice if there's some sorting among fuzzy-matched candidates themselves. For example, in the scenario below, currently
previous-logical-line
appears to the front, because it was at the front in the original candidate list. But intuitively and apparently, I would like candidates that contain "prescient" to come up first.Maybe it can be achieved by comparing the number of consecutive matched letters?