jamesturk / jellyfish

🪼 a python library for doing approximate and phonetic matching of strings.
https://jamesturk.github.io/jellyfish/
MIT License
2.04k stars 157 forks source link

Issue in match_rating_codex() output #121

Closed bx-r0 closed 4 years ago

bx-r0 commented 5 years ago

match_rating_codex does not seem to be removing double constants pairs correctly and, therefore, is producing an incorrect codex.

The result from:

from jellyfish import match_rating_codex

print(match_rating_codex("William"))

Produces:

WLLM

This is not correct and the result should WLM due to the double constant pair being removed.

This is stated in the Wikipedia article linked in the Jellyfish documentation:

[...]
2. Remove the second consonant of any double consonants present
[...]
jamesturk commented 4 years ago

fixed in 0.8