rapidfuzz / RapidFuzz

Rapid fuzzy string matching in Python using various string metrics
https://rapidfuzz.github.io/RapidFuzz/
MIT License
2.69k stars 119 forks source link

Processor Extraction Choices Documentation #368

Closed Scrxtchy closed 6 months ago

Scrxtchy commented 7 months ago

https://github.com/rapidfuzz/RapidFuzz/blob/af9e1b76bea9c6f5dcfd7bf37f2ab571fef4d8f7/src/rapidfuzz/process_py.py#L77-L79

Is this still accurate? I've found that using a dict with

process.extract("Test",choices=[{item['key']: item['name']} for item in items ],scorer=fuzz.WRatio ,limit=7)

Will return a KeyError 0,

Trace ``` Traceback (most recent call last): File "", line 1, in File "src/rapidfuzz/process_cpp_impl.pyx", line 1251, in rapidfuzz.process_cpp_impl.extract File "src/rapidfuzz/process_cpp_impl.pyx", line 1130, in rapidfuzz.process_cpp_impl.extract_list File "src/rapidfuzz/process_cpp_impl.pyx", line 1011, in rapidfuzz.process_cpp_impl.extract_list_f64 File "src/rapidfuzz/process_cpp_impl.pyx", line 220, in rapidfuzz.process_cpp_impl.preprocess_list File "./src/rapidfuzz/cpp_common.pxd", line 333, in cpp_common.conv_sequence File "./src/rapidfuzz/cpp_common.pxd", line 322, in cpp_common.hash_sequence File "./src/rapidfuzz/cpp_common.pxd", line 311, in cpp_common.hash_sequence KeyError: 0 ```

but a tuple is what I should be using

process.extract("Test",choices=[(item['key'], item['name']) for item in items ],scorer=fuzz.WRatio ,limit=7)

Ref: https://github.com/rapidfuzz/RapidFuzz/blob/af9e1b76bea9c6f5dcfd7bf37f2ab571fef4d8f7/src/rapidfuzz/process_py.py#L142

maxbachmann commented 7 months ago

Probably those do need a second look. Especially to make sure the documentation is the same as the type hint above.

I assume you are trying to use the Mapping version, which according to the type hint would be Mapping[Any, Sequence[Hashable] | None]. However you are passing a Sequence[Mapping[...]] instead.

What you could do is something like the following:

>>> items=[{"name": "test", "key": "test2"}]
>>> process.extract("Test",choices={item['key']: item['name'] for item in items},scorer=fuzz.WRatio ,limit=7)
[('test', 75.0, 'test2')]