Closed RossKen closed 3 months ago
I'd definitely be open to this one, it's a little unorthodox for strings, but I think it's simple & well-defined enough to be useful. Would you compute it between n-grams? (Presumably with a tunable n?)
+1 for this feature request. I'd suggest not using n-grams by default but enable them if the parameter n is set. Example signature:
def jaccard_similarity(str1: str, str2: str, ngram_size: int | None = None) -> float:
...
released in 1.1.0! thanks @NiklasvonM
Thanks for a great package! I am planning to use this for some of my work in the record linkage package, Splink
It would be really great to add jaccard similarity as an option within jellyfish.
I can give a PR a shot, but I haven't done any Rust before so I can't guarantee how well (or quickly) I would do it 😅