Closed deifactor closed 10 months ago
good find, I did have cases for checking that unicode substring search works for matches but not rejection test cases for cases where patterns did not match. The bug was a small oversight and has been fixed in 34553f0.
I included this in #33 just in time for the new release
I'm actually getting the opposite problem now where now it's not matching even though it should be.
hmm yeah that is an orthogonal bug, fixed in c7893db4b4fd2d53d8bee036ce6823f9c94544b0
Now I'm getting a crash:
let needle = Utf32String::from("adi");
let haystack =
Utf32String::from("At the Road’s End - Seeming - SOL: A Self-Banishment Ritual");
let mut matcher = Matcher::new(Config::DEFAULT);
assert_ne!(
matcher.substring_match(haystack.slice(..), needle.slice(..)),
None
)
range end index 60 out of range for slice of length 59
stack backtrace:
0: rust_begin_unwind
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:597:5
1: core::panicking::panic_fmt
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/panicking.rs:72:14
2: core::slice::index::slice_end_index_len_fail_rt
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/slice/index.rs:76:5
3: core::slice::index::slice_end_index_len_fail
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/slice/index.rs:68:9
4: <core::ops::range::Range<usize> as core::slice::index::SliceIndex<[T]>>::index
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/slice/index.rs:411:13
5: core::slice::index::<impl core::ops::index::Index<I> for [T]>::index
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/slice/index.rs:18:9
6: nucleo_matcher::exact::<impl nucleo_matcher::Matcher>::substring_match_non_ascii
at /home/vector/.cargo/git/checkouts/nucleo-fe29e1ee969779b0/c7893db/matcher/src/exact.rs:248:28
7: nucleo_matcher::Matcher::substring_match_impl
at /home/vector/.cargo/git/checkouts/nucleo-fe29e1ee969779b0/c7893db/matcher/src/lib.rs:485:17
8: nucleo_matcher::Matcher::substring_match
at /home/vector/.cargo/git/checkouts/nucleo-fe29e1ee969779b0/c7893db/matcher/src/lib.rs:422:9
Seems like the index generation is also really glitchy:
Now I'm getting a crash:
let needle = Utf32String::from("adi"); let haystack = Utf32String::from("At the Road’s End - Seeming - SOL: A Self-Banishment Ritual"); let mut matcher = Matcher::new(Config::DEFAULT); assert_ne!( matcher.substring_match(haystack.slice(..), needle.slice(..)), None )
range end index 60 out of range for slice of length 59 stack backtrace: 0: rust_begin_unwind at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:597:5 1: core::panicking::panic_fmt at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/panicking.rs:72:14 2: core::slice::index::slice_end_index_len_fail_rt at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/slice/index.rs:76:5 3: core::slice::index::slice_end_index_len_fail at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/slice/index.rs:68:9 4: <core::ops::range::Range<usize> as core::slice::index::SliceIndex<[T]>>::index at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/slice/index.rs:411:13 5: core::slice::index::<impl core::ops::index::Index<I> for [T]>::index at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/slice/index.rs:18:9 6: nucleo_matcher::exact::<impl nucleo_matcher::Matcher>::substring_match_non_ascii at /home/vector/.cargo/git/checkouts/nucleo-fe29e1ee969779b0/c7893db/matcher/src/exact.rs:248:28 7: nucleo_matcher::Matcher::substring_match_impl at /home/vector/.cargo/git/checkouts/nucleo-fe29e1ee969779b0/c7893db/matcher/src/lib.rs:485:17 8: nucleo_matcher::Matcher::substring_match at /home/vector/.cargo/git/checkouts/nucleo-fe29e1ee969779b0/c7893db/matcher/src/lib.rs:422:9
For this crash it looks like above the code in the commit that Pascal refrenced where it is enumerating to the end of the haystack it should be enumerating to the end minus the length of the needle.
Yeah I have a fix for that but I dont want to rush another fix (altough it is yet another existing bug, I guess I just need to weite moee unicode-unicode suabteing tests. I never type unicode so I rarely run into these).
Yeah, I don't type Unicode usually but I have a music collection with a lot of foreign stuff or weird smart quotes and I'm writing a music player so I've been hitting the ascii-needle/unicode haystack codepath a lot.
This should pass, but it fails with a score of 30; running with
indices
indicates that only the first codepoint in the haystack matches. If I get rid of the Japanese text the match goes away as expected. Fuzzy, postfix, and prefix match all indicate that there is no match; it's only substring match that breaks.edit: If I use
Utf32String::Unicode("lying".chars().collect())
there's no match, so I think the 'ascii needle, unicode haystack' codepath is the one with the problem.