jamesturk / jellyfish

🪼 a python library for doing approximate and phonetic matching of strings.
https://jamesturk.github.io/jellyfish/
MIT License
2.04k stars 157 forks source link

Panic on `None` value with `match_rating_codex` for Certain Inputs #210

Closed VictorGoubet closed 3 months ago

VictorGoubet commented 5 months ago

Description: When using the match_rating_codex function from the Jellyfish library on certain inputs (specifically, names with non-ASCII characters such as "Frédéric"), the library panics and causes a Python exception pyo3_runtime.PanicException: called 'Option::unwrap()' on a 'None' value. This unexpected behavior crashes the Python interpreter, bypassing typical error handling mechanisms in Python.

Steps to Reproduce:

  1. Pass a name with special/non-ASCII characters (e.g., "Frédéric") to jellyfish.match_rating_codex function.
  2. Observe that instead of handling the input gracefully (e.g., returning None or raising a catchable exception), the library panics.

Expected Behavior: The library should handle all inputs gracefully, either by processing them correctly or by raising an exception that can be caught in Python.

Actual Behavior: The library panics on certain inputs, leading to a crash of the Python interpreter with the following error message:

thread '<unnamed>' panicked at src/match_rating.rs:28:42: called `Option::unwrap()` on a `None` value
...
pyo3_runtime.PanicException: called `Option::unwrap()` on a `None` value

Environment:

jamesturk commented 5 months ago

Thanks for this report, I'm a little behind, but I'll get a fix in soon!

jamesturk commented 3 months ago

Was kind of a tricky one, the unicode character had to be in the ~3rd position (or 3rd from last) so that the string split would fail due to how Rust handles slicing strings. I've fixed this so it provides consistent (if nonsensical, MRC doesn't really support accented characters) results.