I was implementing my version of soundex here, using jellyfish as my baseline for comparisons, when I stumbled on a surprising difference. Here's what jellyfish returns when computing the soundex for "Ashcroft":
Here's some documentation I found concerning this value, taken from Wikipedia:
Using this algorithm, [...] "Ashcraft" and "Ashcroft" both yield "A261" and not "A226" (the chars 's' and 'c' in the name would receive a single number of 2 and not 22 since an 'h' lies in between them).
While this paragraph is unclear on which value is correct, another is pretty clear about it:
two letters with the same number separated by 'h' or 'w' are coded as a single number, whereas such letters separated by a vowel are coded twice
This leads me to believe that the soundex returned should be A261 and not A226, as explained in the previous quote. The issue can likely be solved by patching cjellyfish to skip H and W when removing adjacent soundex digits.
TL;DR I believe
soundex('ashcroft') == 'A261'
I was implementing my version of soundex here, using jellyfish as my baseline for comparisons, when I stumbled on a surprising difference. Here's what jellyfish returns when computing the soundex for "Ashcroft":
Here's some documentation I found concerning this value, taken from Wikipedia:
While this paragraph is unclear on which value is correct, another is pretty clear about it:
This leads me to believe that the soundex returned should be
A261
and notA226
, as explained in the previous quote. The issue can likely be solved by patching cjellyfish to skipH
andW
when removing adjacent soundex digits.