jamesturk / jellyfish

🪼 a python library for doing approximate and phonetic matching of strings.
https://jamesturk.github.io/jellyfish/
MIT License
2.04k stars 157 forks source link

#186 implement Jaccard similarity #214

Closed NiklasvonM closed 1 month ago

NiklasvonM commented 1 month ago

I tried my best to implement the Jaccard similarity from Issue #186. I also created a PR at https://github.com/jamesturk/jellyfish-testdata/pull/9. However, I am unsure how to integrate the Git submodule changes. testdata has yet to be updated in this PR.

I was also not sure if I am still supposed to create a Python implementation. I simply added a Rust implementation for now.

Furthermore, I had to update testutils.rs to support the optional ngram_size parameter.

jamesturk commented 1 month ago

thanks for this! I'm traveling these next few weeks so may be slow but will take a closer look and get this merged soon!