Sorting among the fuzzy matches?

liushihao456 commented 1 year ago

I have always thought there is a sorting mechanism that sorts fuzzy-matched candidates, until today. Reading the "Algorithm" section carefully again, I understand now that the sorting is done by frequency, recency and length.

I think it would be nice if there's some sorting among fuzzy-matched candidates themselves. For example, in the scenario below, currently previous-logical-line appears to the front, because it was at the front in the original candidate list. But intuitively and apparently, I would like candidates that contain "prescient" to come up first.

Maybe it can be achieved by comparing the number of consecutive matched letters?

raxod502 commented 1 year ago

This is why I do not personally use fuzzy matching in prescient.el.

I think it is unlikely to be possible to solve this problem in an elegant way. How would you want to prioritize matches? First by whether they have a substring match, then by frecency? Then you would end up with a bunch of infrequently used candidates before the recent ones, just because they happen to be substring matches.

I think you need some kind of heuristic to trade off between the different factors. Unfortunately, one of the design goals of prescient.el is that the sorting and filtering algorithms are as simple as possible and never have any nondeterminism, so I do not think it would be a good idea to add that.

Have you considered more heuristic-based incremental narrowing frameworks such as https://github.com/lewang/flx?

okamsn commented 1 year ago

There's also the built-in completion style flex, if I understand the description given for flex-score-match-tightness:

Controls how the flex completion style scores its matches.

Value is a positive number. A number smaller than 1 makes the scoring formula reward matches scattered along the string, while a number greater than one make the formula reward matches that are clumped together. I.e "foo" matches both strings "fbarbazoo" and "fabrobazo", which are of equal length, but only a value greater than one will score the former (which has one large "hole" and a clumped-together "oo" match) higher than the latter (which has two "holes" and three one-letter-long matches).

However, since this modifies the sorting metadata of the candidates, it wouldn't use the prescient.el sorting.

liushihao456 commented 1 year ago

I think it is unlikely to be possible to solve this problem in an elegant way. How would you want to prioritize matches? First by whether they have a substring match, then by frecency? Then you would end up with a bunch of infrequently used candidates before the recent ones, just because they happen to be substring matches.

What I meant is to sort only the remaining matches after moving frequent/recent candidates to the front. Frequency/recency has a higher priority. Just like the behavior of the default sorting by length.

Therefore the logic happens just where sorting by length happens. We might provide several sorting methods, prescient-sort-by-length, prescient-sort-by-fuzzy-score, etc.. Or maybe provide a sorting function variable, prescient-sorting-method that defaults to prescient-sort-by-length, but would allow for customization.

Unfortunately, one of the design goals of prescient.el is that the sorting and filtering algorithms are as simple as possible and never have any nondeterminism, so I do not think it would be a good idea to add that.

I understand. And I really like the way prescient.el (and selectrum.el) is written. It's a personal preference to rely on fuzzy matching. I can just type on without considering adding spaces.

If it's not a good idea to add it, could you enlighten me how I can customize it to achieve my goal? Maybe by adding some advice somewhere?

liushihao456 commented 1 year ago

There's also the built-in completion style flex, if I understand the description given for flex-score-match-tightness:

Controls how the flex completion style scores its matches. Value is a positive number. A number smaller than 1 makes the scoring formula reward matches scattered along the string, while a number greater than one make the formula reward matches that are clumped together. I.e "foo" matches both strings "fbarbazoo" and "fabrobazo", which are of equal length, but only a value greater than one will score the former (which has one large "hole" and a clumped-together "oo" match) higher than the latter (which has two "holes" and three one-letter-long matches).

However, since this modifies the sorting metadata of the candidates, it wouldn't use the prescient.el sorting.

I have tried it. No matter what I set for flex-score-match-tightness, the candidates just seem to be sorted by length. Both 0.7 and 3 produce the following result: 微信截图_20220923091448

My config:

;; Remove the original prescient-relevant config
;; Then
(setq completion-styles '(flex basic))
(setq flex-score-match-tightness 0.7) ; or 3

I still prefer prescient.el because I'm able to toggle fuzzy filtering via M-s f, or split the search string with spaces to narrow down further, etc..

okamsn commented 1 year ago

If it's not a good idea to add it, could you enlighten me how I can customize it to achieve my goal? Maybe by adding some advice somewhere?

If you would like to experiment with Selectrum Prescient, look at the function selectrum-prescient--refine. How selectrum-prescient-mode currently works is:

Apply frecency sorting via selectrum-preprocess-candidates-function.
Do filtering via selectrum-refine-candidates-function.
Still in the refinement function, maybe apply further sorting. This is how prescient-sort-full-matches-first is applied.

You could try using a custom refinement function.

liushihao456 commented 1 year ago

With your help, I came up with the following snippet using flx.el:

(defvar prescient-flx-threshold 500)
(require 'flx)

(defun my/prescient-filter-flx-advice (fn query candidates)
  (let ((results (funcall fn query candidates)))
    (if (and results
             (not (string-empty-p query))
             (< (length results) prescient-flx-threshold))
        (let* ((queries (prescient-split-query query))
               (matches (mapcar (lambda (item)
                                  (cons item
                                        (apply '+
                                               (mapcar
                                                (lambda (q)
                                                  (car (or
                                                        (flx-score item q flx-file-cache)
                                                        '(-100))))
                                                queries))))
                                results)))
          (setq matches (sort matches (lambda (x y) (> (cdr x) (cdr y)))))
          (mapcar (lambda (x) (car x)) matches))
      results)))

(advice-add #'prescient-filter :around #'my/prescient-filter-flx-advice)

It will sort the candidate list after prescient-filter. The sorting logic of flx works reasonably well according to my experiment. My input will bring the intended candidate to the front most of the times.

However, my elisp is kinda weak, so any suggestions on possible improvements will be appreciated. :)

raxod502 commented 1 year ago

Oh, if you just want a different sorting order to replace the sorting by length that occurs at the end, that seems pretty reasonable. I didn't think it would be terribly helpful just because usually the length sorting doesn't make much of a difference for me (most of the candidates are previous selections), but if you find it coming up frequently, that seems quite reasonable. I don't see why we can't just add a user option that allows you to provide your own key function to use instead of length at the end of the chain.

raxod502 commented 1 year ago

This thread is being closed automatically by Tidier because it is labeled with "waiting on response" and has not seen any activity for 90 days. But don't worry—if you have any information that might advance the discussion, leave a comment and I will be happy to reopen the thread :)

carrete commented 1 year ago

I don't see why we can't just add a user option that allows you to provide your own key function to use instead of length at the end of the chain.

I see this issue has been closed, but has this been implemented? Currently, when I type something like spull I get straight-push-all before straight-pull-all. Like as shown in the screenshot in https://github.com/radian-software/prescient.el/issues/128#issuecomment-1256937095, I'd like to sort matches with more matching contiguous characters higher, and sort matches that occur earlier in the string higher too. Thanks

raxod502 commented 1 year ago

No, there were no changes made here to my knowledge. Just to clarify, you'd be fine with providing your own scoring function to use in place of length after recency information is taken into consideration? Or did you have a different mechanism in mind?

carrete commented 1 year ago

I was under the assumption that work is needed to allow this. A user option needs to be added?

raxod502 commented 1 year ago

Yes, work would be needed to allow this. I just wanted to clarify your use case before embarking on work to implement that work.

carrete commented 1 year ago

Oh, sorry. I misunderstood. Yes, I'd be fine with supplying a scoring function of my own

raxod502 commented 1 year ago

@carrete How's https://github.com/radian-software/prescient.el/pull/151?

carrete commented 1 year ago

@raxod502 Thank you very much for taking the time to put this together! Would it be possible to pass the user's input to the tiebreaker function too? For example, given two candidates, "foobar-stuff" and "stuff-foobar", and the user's input of "foobar", I would want to return "foobar-stuff" as the preferred candidate since it includes the user's input closer to the start of the string. There are other criteria I'd like to use too, this is just one example. Thanks again!

okamsn commented 1 year ago

That might be minibuffer-contents.

raxod502 commented 1 year ago

True, but the way of accessing the user's search query might depend on the UI framework. Better that prescient.el provides something that is guaranteed to work in every environment, since it already has access to the data. Try the latest commit on the PR, you can use the prescient-query variable now in your tiebreaker function.

carrete commented 1 year ago

I'm having a heck of a time getting this to work. I've reinstalled ~/.emacs.d from scratch to be sure that I'm not getting tripped up by some previous historical preference for unwanted match candidates (and that I'm using the same package versions as specified by radian). However, the results are still wonky.

What I've come up with so far looks like:

(radian-local-on-hook after-init
  (defun tvaughan/prescient-tiebreaker (candidate1 candidate2)
    (let ((index1 (cl-search prescient-query candidate1))
          (index2 (cl-search prescient-query candidate2)))
      (if (and (integerp index1) (integerp index2))
          (cond ((< index1 index2) -1)
                ((> index1 index2) 1)
                (t (cond
                    ((< (length candidate1) (length candidate2)) -1)
                    ((> (length candidate1) (length candidate2)) 1)
                    (t 0))))
        (cond
         ((integerp index1) -1)
         ((integerp index2) 1)
         (t 0)))))
  (setq prescient-filter-method '(literal fuzzy regexp))
  (setq prescient-persist-mode t)
  (setq prescient-sort-length-enable t)
  (setq prescient-tiebreaker #'tvaughan/prescient-tiebreaker)
  ...)

When I enter M-x m and M-x ma, man is the top candidate as expected, but when I enter M-x mag I get:

magit should be the top candidate. magit is the top candidate if I remove fuzzy from prescient-filter-method but I depend on fuzzy matching almost exclusively. This is the sort of behavior I've always seen no matter what tweaks I made. The behavior is the same no matter what prescient-sort-length-enable is set to, for example. I've never been able to get the candidate with the most contiguous matching characters to be the top candidate after a couple of keystrokes. For example, if I unset the tiebreaker function I get:

Also please note these top candidates are ones I have never selected before. Perhaps there's some fundamental concept about this that I still don't understand?

raxod502 commented 1 year ago

I'm using the same package versions as specified by radian

Except for prescient.el, which is running from https://github.com/radian-software/prescient.el/pull/151, right?

carrete commented 1 year ago

Yes, sorry that wasn't clear.

{tvaughan@Patagonica} ~/.emacs.d/straight/repos/prescient.el[rr-tiebreak] %
$ git log --oneline | head -n5
9548bec Grant access to original search query
73ade3d [#128] Custom tiebreaking function
e0cca29 Fix and improve sorting full matches first under some conditions. (#149)
d7cc55d Only add properties for `prescient-sort-full-matches-first` to the first candidate. (#148)
a402a7e Add unit tests (#150)

raxod502 commented 1 year ago

You're looking for (setq prescient-sort-full-matches-first nil). Otherwise you'll get candidates sorted first that fully match the input (which, with fuzzy, will include a lot more than you might anticipate).

This tripped me up for a while trying to answer your question.

carrete commented 1 year ago

Thank you, @raxod502. With the tiebreaker function unset, this works exactly like my preferred behavior.

I can still see where I might want to use the tiebreaker function, but if you would prefer to not merge this (not that my opinion should carry any weight), I won't be negatively impacted. Thanks again

raxod502 commented 1 year ago

Ah, great to hear! Yeah, that interaction did not occur to me before. Anyway, the code and tests are already written, so I'll go ahead and just merge the feature. Maybe somebody else will find it helpful.

carrete commented 1 year ago

Thanks again, @raxod502

radian-software / prescient.el

Sorting among the fuzzy matches? #128