Closed yorickpeterse closed 10 years ago
Derp. Seems MySQL has an extra feature of sorts to return longer strings. In that case I would consider the second "bug" a "nice to have" instead.
Re 1: This is as documented, though the documentation isn't very good and I agree that this behaviour is probably counter-intuitive.
I think it would make more sense if it processed only A-Z in the input, but that could be a breaking change for any applications that rely on the current behaviour (though I'd be surprised if there are any).
Re 2: As you've noted, MySQL is not standard. Soundex is supposed to generate 4-element codes. I'm not sure if that's really nice to have, or just buggy!
If I were to make any significant changes to Soundex, I think I'd want to add support for various dialects: Russell; Knuth; US government; Daitch-Mokotoff. On the other hand, I wonder how useful this is given that Metaphone gives better results for this application.
For me it's not super critical as I've since discovered the (double) metaphone implementation which worked better for my use case. Documentation wise, I completely missed that :/
I'll close this, then, and make a note to improve the documentation in future.
It appears that the Soundex implementation is broken due to two problems:
Text::Soundex.soundex
to always returnnil
The first problem can be reproduced as following:
The second problem can be reproduced as following:
If one were to use the
SOUNDEX()
function provided by MySQL they would get the following instead:Perhaps this is intended behaviour. In that case it should be documented somewhere.