helix-editor / nucleo

A fast and convenient fuzzy matcher library for rust
Mozilla Public License 2.0
899 stars 28 forks source link

How should Nucleo work? #38

Closed zjp-CN closed 9 months ago

zjp-CN commented 9 months ago

Thanks for creating the fuzzy library.

I encounter a weird problem for Nucleo struct.

For the following code which you can run on rust-explorer

use std::sync::Arc;
use nucleo::Nucleo;
use nucleo::pattern::{CaseMatching, Normalization};

fn main() {
    let mut matcher = init_fuzzy_matcher();
    let inject = matcher.injector();
    let list = ["foobar", "fxxoo", "oo", "a"];
    list.iter().for_each(|s| {
        inject.push(s, |_| {});
    });
    matcher
        .pattern
        .reparse(0, "f", CaseMatching::Ignore, Normalization::Smart, false);
    let _status = matcher.tick(1000);
    dbg!(matcher.pattern.column_pattern(0));

    let mut counter = 0;
    loop {
        let _status = matcher.tick(100);
        // if status.changed {
        let snapshot = matcher.snapshot();
        let total = snapshot.item_count();
        let got = snapshot.matched_item_count();
        let res: Vec<_> = snapshot
            .matched_items(..)
            .map(|item| item.data)
            .collect();
        dbg!(total, got, res);
        // }
        // if !status.running {
        //     break;
        // }
        println!("running");
        if counter > 4 {
            break;
        }
        counter += 1;
    }
}

type Matcher = Nucleo<&'static str>;

fn init_fuzzy_matcher() -> Matcher {
    Nucleo::new(
        nucleo::Config::DEFAULT,
        Arc::new(|| println!("notified")),
        None,
        1,
    )
}

The res is always empty:

[src/main.rs:34:9] total = 4
[src/main.rs:34:9] got = 0
[src/main.rs:34:9] res = []

By using nucleo::Matcher, for the same config, input and needle string, there is the desired output.

use nucleo::pattern::{Atom, AtomKind, CaseMatching, Normalization};
use nucleo::Matcher;

fn main() {
    let mut matcher = init_fuzzy_matcher();
    let list = ["foobar", "fxxoo", "oo", "a"];
    let res = Atom::new(
        "f",
        CaseMatching::Ignore,
        Normalization::Smart,
        AtomKind::Fuzzy,
        false,
    )
    .match_list(&list, &mut matcher);
    dbg!(res);
}

fn init_fuzzy_matcher() -> Matcher {
    Matcher::new(nucleo::Config::DEFAULT)
}
[src/main.rs:20:5] res = [
    (
        "foobar",
        36,
    ),
    (
        "fxxoo",
        36,
    ),
]

So the question is how we use Nucleo in the right way? I see an issue asking for examples, but no replies in there. I also scan the code in helix's source files, though nucleo is used as its dependency, the real use of it is Matcher, not Nucleo.

zjp-CN commented 9 months ago

Well, I think the problem is from

// Injector<T>
pub fn push(
    &self,
    value: T,
    fill_columns: impl FnOnce(&mut [Utf32String])
) -> u32

I didn't use fill_columns to add the source string to the search list because I mistakenly think value: T is like T in Atom/Pattern:

// Atom/Pattern
pub fn match_list<T>(
    &self,
    items: impl IntoIterator<Item = T>,
    matcher: &mut Matcher
) -> Vec<(T, u16)>
where
    T: AsRef<str>,

Actually, I indeed noticed Injector<T> lacks AsRef<str> bound, and was wondering from where the matcher knows the string source. Now I understand T on Injector<T> and Pattern::match_list<T> mean different things.

And Nucleo is indeed what I need. Here's the working code:

// ...
    let list = [
        "foobar".to_owned(),
        "fxxoo".to_owned(),
        "oo".to_owned(),
        "a long string".to_owned(),
    ];
    for (idx, item) in list.iter().enumerate() {
        inject.push(Idx(idx), |buf| {
            dbg!(buf.len());
            if let Some(buf) = buf.first_mut() {
                *buf = item.as_str().into();
            }
        });
    }
// ...
         snapshot
                .matched_items(..)
                .map(|item| &list[item.data.0])
                .collect();

The last thing I don't understand is why the argument in fill_columns callback is &mut [Utf32String].

zjp-CN commented 9 months ago

The last thing I don't understand is why the argument in fill_columns callback is &mut [Utf32String].

Hah, I just realized it's due to Nucleo::<T>::new(..., columns).

Nucleo can match items with multiple orthogonal properties. columns indicates how many matching columns each item (and the pattern) has. The number of columns can not be changed after construction.

I created a 1 column Nucleo<T>, thus Injector<T> should fill exactly 1 cloumn of Utf32String.

evbo commented 4 days ago

@zjp-CN thank you for the wonderful example! @pascalkuthe should this example be added to docs for a hello world intro?