helix-editor / nucleo

A fast and convenient fuzzy matcher library for rust
Mozilla Public License 2.0
896 stars 28 forks source link

How should/does nucleo handle umlauts? #26

Closed jessegrosjean closed 11 months ago

jessegrosjean commented 1 year ago

For example I notice that a needle ë fails to fuzzy match . On the other hand a needle e will match , and a needle ë will match a haystack ë.

let paths = ["be", "bë"];
let mut matcher = Matcher::new(Config::DEFAULT);
let matches = Pattern::parse("ë", CaseMatching::Ignore).match_list(paths, &mut matcher);
assert_eq!(matches.len(), 1); // fails

Is that expected or a bug? If expected can you say a bit more about why and suggested workarounds... mostly just so I can document to people using my app why it works the way that it does.

Thank you.

Tyarel8 commented 11 months ago

You are using Config::DEFAULT which has normalization turned on.

pub const DEFAULT: Config = {
        Config{
            delimiter_chars: b"/,:;|",
            bonus_boundary_white: BONUS_BOUNDARY + 2,
            bonus_boundary_delimiter: BONUS_BOUNDARY + 1,
            initial_char_class: CharClass::Whitespace,
            normalize: true,
            ignore_case: true,
            prefer_prefix: false,}
    }

This will convert non ascii characters to ascii, so turning it off should solve this.

let mut conf = Config::DEFAULT;
conf.normalize = false;
jessegrosjean commented 11 months ago

Thanks for your help, that solve the problem for me.