jamesturk / jellyfish

🪼 a python library for doing approximate and phonetic matching of strings.
https://jamesturk.github.io/jellyfish/
MIT License
2.04k stars 157 forks source link

Add Double Metaphone function #187

Closed RossKen closed 9 months ago

RossKen commented 1 year ago

Again, thanks for building such a great resource.

I have been using jellyfish for string comparators, but when looking at phonetic transformations the lack of Double Metaphone has meant moving to other packages such as phonetics. It would be fantastic to be able to use jellyfish for everything and it should be a useful addition for other users too.

jamesturk commented 1 year ago

Thanks for the kind words!

Regarding double metaphone, I'd be open to it if someone could contribute it and a good suite of test cases, we did a survey of implementations online before and they are all over the place and very few test cases. The reference implementation is known to have bugs, I think http://aspell.net/metaphone/dmetaph.cpp is maybe the canonical one?

edit: I also checked and basically all of the implementation links at http://aspell.net/metaphone/ are dead :/

My skepticism in adding without a solid reference implementation/test suite is based on the volume of support requests that original metaphone has led to, emails/issues/etc. about "version XYZ of metaphone does this, jellyfish does that" where there's no ground truth.

maxharlow commented 9 months ago

@jamesturk Is Double Metaphone being added or did this get closed by mistake?