helix-editor / nucleo

A fast and convenient fuzzy matcher library for rust
Mozilla Public License 2.0
899 stars 28 forks source link

[Feature request] Way to get scores of many/all items #18

Closed Erithax closed 1 year ago

Erithax commented 1 year ago

There currently only seem to be methods available for getting the best match.

Many usecases require ranking many/all items. Getting back a sorted list would be nice, or at least the ability to get a score for a single needle and a haystack so we can do the collecting and sorting ourselves.

pascalkuthe commented 1 year ago

I am not sure what API you are looking at but the high level API returns a sorted list (Nucleo:;snapshot) and the low level API allows you to both score individual needeles and offers a convenience function to collect and sort an iterator (Pattern::match_list)

Erithax commented 1 year ago

I guess I misunderstand the documentation. (examples would probably help :) ) From

pub fn matched_item_count(&self) -> u32

Returns that number of items that matched the pattern

and

pub fn matched_items(
    &self,
    range: impl RangeBounds<u32>
) -> impl Iterator<Item = Item<'_, T>> + ExactSizeIterator + DoubleEndedIterator + '_

Returns an iteror over the items that correspond to a subrange of all the matches in this snapshot.

I understood matches to be certain items that had a score above a certain thresshold. I now assume this is wrong.

Why is matched_item_count(&self) not always equal to item_count(&self)?

pascalkuthe commented 1 year ago

Ah I see now where your misunderstanding lies now.

item_count() returns the total number of items injected into the matcher. For example, if you opened the helix file picker in a directory this would be the number of files in that directory.

matched_items will return all items that match your pattern. No items match the pattern of course. For example the pattern foo will match foobar but not bar. This list of items returned by this iterator will be sorted by score (and by item length and injection order as a tie-breaker).

This function accepts a range bound so matched_items(..) will simply iterate all matched (sorted) items. In most cases (like a fuzzy matcher UI) you wouldn't actually want to iterate over all fuzzy matches (there could be millions of them after all) and instead only display the first N in the UI. To iterate the top N maches you can simply use snapshot.matched_items(..N.min(snapshot.matched_items()).

In general, the high-level AP in the nucleo crate is meant to essentially allow integrating a fully featured fzf-like fuzzy picker into your application so it's essentially a managed API that owns all the data (and does the processing in a background thread pool). That is also the reason why there are two lists here: the total item list and the matches. The high-level API is meant to be interactive (rescosing on every keypress in the picker) so it needs to keep all items around (internally the matches list just contains indecies that point into the full items list).

If you are looking to just do a one-off sort (either of individual items or a list) and not looking for something interactive, you should use the low-level nucleo-matcher crate.

Erithax commented 1 year ago

I got the desired result by using nucleo-matcher and appending non-matched items to my_pattern.match_list(), it's working well, thank you!

pascalkuthe commented 1 year ago

In that case it seems like you just used the wrong crate for your usecase so I will close this issue