jamesturk / jellyfish

🪼 a python library for doing approximate and phonetic matching of strings.
https://jamesturk.github.io/jellyfish/
MIT License
2.04k stars 157 forks source link

soundex consecutive repetitions #152

Closed jamesturk closed 2 years ago

jamesturk commented 2 years ago

report via email:

We have found a slight error with the version we downloaded and as of the most recent release October 27th it has not been fixed. The Soundex code does not remove any consecutive repetitions of code.

For example: Shan Han results in S550 when it should return S500.

We fixed it by changing:

if sub != last: to if (sub != last) and (sub !=result[-1]):


haven't investigated this, but filing it here for later

jamesturk commented 2 years ago

Checked this against PHP's soundex and S550 was the result.

jamesturk commented 2 years ago

this is an optional rule, deciding to close this as it seems valid as-is