Ignore all spaces from the strings (this is a new addition for the Python version which was presumably irrelevant in the original context, where the names would not have had spaces in them); in particular, if the first character is a space, the second character is treated as the initial character
Add the first character to the codex
Add any consonant to the codex if it is not the same as the previous character
Reduce codex to 6 letters by joining the first 3 and last 3 letters only
The test for vowels is somewhat convoluted (and slightly incorrect) in the Wikipedia description; the above description is slightly simpler.
This patch implements the above description; all of the tests still pass. There is a parallel PR for the cjellyfish implementation.
In https://github.com/jamesturk/jellyfish/blob/e1be2f9055c698ba9e89c588b7ac321f8ff540b1/jellyfish/_jellyfish.py#L342-L347 the comment says that we append the character to the codex if it is not a space OR starting character and vowel or ..., but the code appends the character if it is (not a space AND starting character and vowel) or (...). So one of them at least is wrong. Having a look at the Wikipedia page https://en.wikipedia.org/wiki/Match_rating_approach, it would seem that both the comment and code are likely to be wrong. What is probably wanted, interpreting the given encoding rules, is the following:
The test for vowels is somewhat convoluted (and slightly incorrect) in the Wikipedia description; the above description is slightly simpler.
This patch implements the above description; all of the tests still pass. There is a parallel PR for the
cjellyfish
implementation.